This article covers the dynamics that managers have to respond to in the period between code complete and ship in a professional software development environment. Through pragmatic principles and hands-on examples from real-world projects, it demonstrates how to plan the endgame, how to target the established quality bar, how to measure progress, how to use steering forces to drive progress, and how to deal with team morale issues during long stabilization efforts.
The endgame in a software development project is the period between the completion of the last feature (a.k.a. Code Complete) and Ship. This period has many interesting challenges because the precise amount and nature of remaining work is not precisely known, and team members work with dynamic task lists. How do we arrive at Ship with the required quality in the shortest time possible? Which defects do we still fix? How do we safeguard the targeted quality at the moment we ship? How do we keep the team morale up during this demanding period?
This paper presents important lessons learned during the numerous endgames that I have driven in demanding, world-class environments such as Microsoft Corp. Rather than elaborating on optimized testing strategies or metrics models, this paper focuses on pragmatic best practices that have been the key success factors in past endgames.
The key difference between the regular development milestones and the endgame is that tasks are not known beforehand. The number of defects to fix, their severity, and the associated amount of work can only be predicted. The development team is rather driven by a dynamic set of tasks, fed primarily by the newly found and approved defects. This creates specific challenges for insuring correct individual priorities and distribution of work, tracking code changes that go into the build, and keeping short turnaround time on testing check-ins.
Even though test resources have a more structured list of tasks to perform at this time (e.g. executing system test passes), a good test team does adjust test focus and tasks based on the test results. Thus also the test team deals with a non-static set of tasks.
The ability to control project execution hinges directly on timely, high quality data about the state. Especially with smaller test teams, it becomes difficult to provide accurate data in real-time.
Lesson 1: Plan The Endgame
Despite the many unknowns described above, we do have the ability to plan the endgame to a large extent. Even with the ultimate team target of shipping the product on the planned ship date, it is crucial for the productivity and motivation of the team to split the endgame into sub-phases with specific associated test and fix focus, and metrics targets. This approach is at the same time the key to increasing predictability and decreasing risks because each phase allows code check-ins up until a specified risk level. And as a proper project management practice, passing intermediate gates does provide visibility in progress and probability to ship on time.
Divide into sub-phases
I recommend a division of an endgame period in at least four sub-phases: Unlimited Bug Fixing (UBF), Limited Bug Fixing (LBF), Quiet Period (QP), and Release Candidate (RC).
Unlimited Bug Fixing (UBF) allows the development team to reach maximum throughput on fixing defects. The approval process for newly found defects is lightweight, i.e. simple prioritization of incoming defects in the defect database, and the primary focus of the test team is to insure that new fixes don't destabilize the build.
Limited Bug Fixing (LBF) is only entered once the unit and integration test passes have been executed and all known severe high-risk defects are in the build. At this time, the "code check-in funnel" becomes narrower through a second approval step before actual fixes are made. This second approval for a defect happens once the content of the fix has been determined. Driving factors are: the risk of breaking code, the estimated effort to fix the defect and the amount of (re)testing required. During this phase, the code base is stable enough for an efficient system test pass and for measuring higher Mean-Time-Between-Failure values.
The purpose of the Quiet Period (QP) is to increase and confirm our knowledge about the state of the system, by testing for a number of days (e.g. 3 days) on an unmodified executable. Only defects with extremely high severity (called "showstoppers") are fixed at this time, and only if the risk is sufficiently controllable. Successfully passing this period, i.e. without finding showstopper defects, demonstrates in combination with metrics whether the software is ready to be shipped. If showstoppers are found, the QP period restarts with a new countdown.
At the exit gate of the QP, the Release Candidate (RC) is cut, and preferably sent out to all beta users for a final test. The focus is on production testing, and surprises should normally not happen at this time. In the more aggressive projects where criteria in earlier sub-phases were compromised, I have seen the number of release candidates go up to 5, i.e. new QP-RC mini cycles had to be executed in this case.
Besides the division into sub-phases, establishing metrics projections is a key condition for a well-controlled endgame. I recommend every engineering team to review historic data from past project, in order to make reasonable projections. Specific parameters include defect densities (number of defects/KLOC), defect trend slopes prior and post Code Complete, individual find and fix rates, and approval rates. With these parameters, establish a "Defect Convergence Model" in which you start with the current number of defects and set intermediate targets for the number of active defects that you expect to have. During execution of the endgame, actual defect counts are compared to this reference trend and the delta is strictly tracked. Exceeding the projected trend generates early signs for the need of required measures in order to hold on to the ship date.
Lesson 2: Deal With The Risks
At the beginning of this paper, a number of particular endgame challenges have been discussed. Many of the endgame risks come from these challenges. In most cases, one or more practices do exist that can help avoid or reduce potential impact.
Stability risks come from unknowns with respect to defects in the system. How many defects are still out there? And for known defects that we decide not to fix, how many of these will hurt us after ship?
The very first, fairly obvious, practice is to avoid defects. Although not totally realistic, defects–and thus added risks–can definitely be reduced by proper software development practices. Three specific practices that I have found to add high value in reducing defects are spec reviews, structural unit tests, and code reviews.
Although one often considers the remaining unknown defects as the primary stability risk factor, risks also come from not understanding known defects. This occurs especially with defects raised by beta users. It is therefore essential that every single defect–whether detected internally or externally–is investigated seriously. For the known defects, it is also crucial that the right defects are fixed. Which defects are approved for fixing, which are postponed? This is ultimately what determines how well we control the risks and how well the delivered quality is tuned to the targeted quality bar. Thorough understanding of both the severity and the content of the fix is needed in order to make the right calls, i.e. these decisions should be taken by someone who understands both system operation and development issues.
Stability risks from unknown defects are a main cause of concern to many QA and Engineering managers. However, in most cases this fear is caused by weak test coverage or lack of depth during system tests. The key practices to reduce these risks is to set a smart test strategy focused on high-risk test areas, to properly tracking test execution, and to generate solid metrics.
Even when stability risks are fully under control, the endgame can still fail because end dates are not met. As with stability risks, let us first look at some root causes.
Failures often are due to the fact that the number of defects and associated defect fixing effort were severely underestimated. A good developer is able to indicate risks of destabilization and quality of code. A good tester is able to give a fairly accurate indication of how stable an area of ownership is. And review of previous metrics data can support the creation of realistic estimates.
Another common root cause of schedule slippage is that we fix too many defects. Too many defects are approved for fixing, the regression ratio (i.e. introducing new defects while fixing a defect) is too high, or code is churned that should not even be churned.
The most effective measure to reduce schedule risks is to set and execute on short-term targets, and be disciplined in what we fix and how we fix. Not only do short-term targets boost productivity and morale, they also provide intermediate measurable checkpoints.
Lesson 3: Drive Proactively
My personal favourite practice for successfully executing an endgame is rather simple but so often forgotten: drive your team every day, and do it in an anticipatory way. The endgame, although partially dictated by the defects that are found, does not have to be a reactive game.
Drive the fixing of defects
By being attentive on incoming defects, by prioritizing and investigating with quick turnaround, and by distributing defects to the right persons, a team reaches substantially higher yields than in an "un-driven" environment. Removing obstacles and resolving dependencies early is another aspect of driving effectively. For team-level driving, I have found it very effective to communicate metrics daily to the team, including interpretation of results and reiteration of focus and priorities. On the metrics side, it is important to include both the hard defect counts and the qualitative feedback from your testers. Especially with testers who have ownership of specialized areas, qualitative metrics provide extremely valuable additional indicators.
Drive the finding of defects
Testing efforts are also much more effective when driven actively. By carefully observing the type and area of defects, useful information can be derived about weak product areas, risks that were not known, or test types that have not been executed sufficiently well.
A key practice for enabling the effective execution of regression testing new fixes is the use of so-called Test Release Documents (TRDs). A TRD is a short note written by a developer with a clear description of the content of the fix, the scenarios that are affected, and other areas that need to be retested.
Lesson 4: Ship With Discipline
It is worth to enumerate a few specific practices for the release candidate (RC) sub-phase. Even though the bits are frozen at this point in time, quality can be seriously compromised and schedule time can be lost.
The most important advice for cutting the RC-1 is to only call a build an RC when it is truly an RC. When arrived at this point, the team is anxious to ship the release and an RC-1 is often cut too early. If this is indeed the case, the result is usually a series of additional release candidates with a substantial loss of time because of the iterations. An easy solution is to go into RC mode only if predefined criteria are met with respect to testing coverage, defect metrics, and MTTF data. The second practice is to use the Quiet Period properly prior to cutting the RC, in order to confirm the readiness of the current build.
Finally, do verify everything that changes and everything that has human involvement. It is very easy to assume that once the RC is cut, the team is ready. There is still potential for failure, even if the only remaining change is the change of a software label or the manufacturing of CDs.
Lesson 5: Address The Human Side
Endgames can be extremely demanding on a team, especially if the time span is long and the ship date is aggressive. I therefore consider it a worthwhile practice to insert team morale boosters on a regular basis. It's a totally human practice, with amazing returns.
Inserting fun events could even be combined with useful software development practices. A concrete example is to organize regular bug bashes, in which teams try to find as many defects as possible in an informal competition. Or to give away bug bounties for every xth defect detected. The short-term targets are also an excellent way to create the important feeling of accomplishment on a regular basis even if the ship date is still far ahead.