This month we will discuss some of the difficulties encountered when attempting Continuous Integration for multiple component teams working together to develop a large system. We describe the concept of a Staging Area to help coordinate the teams and stabilize the interdependencies between built versions of components.
Continuous Integration for Component-Based Development
In small team development, the practice of Continuous Integration [2] is an effective technique for keeping every one on the team coordinated with the latest results of everyone else's changes. Practicing Continuous Update[3] and Private Build[4] in one's Private Workspace[5], as part of a Two-Phased Codeline Commit strategy helps ensure workspaces and work tasks stay "in sync" while Integration Builds ensure the team's codeline remains stable, consistent, coherent, and correct.
Any built version of a component that needs to be accessed by internal stakeholders (such as a QA/V&V group) needs to be identified (e.g., using a label/tag). This ensures that anyone who needs to look at it (even after it is no longer "latest and greatest") can easily do so, and know which version of the component and its corresponding source code they are looking.
In an ideal world, we could build the entire system directly from the sources in a one-step process, for everyone working on any component. And (again) ideally - we have the storage capacity, and network bandwidth, and processing power, and load distribution, to build the whole system (or at least incrementally build the whole system) every time before a developer commits their changes to the codeline.
Sometimes, for reasons of build-cycle-time, or network resource load, or schedule coordination (e.g., multiple time zones, or interdependent delivery schedules of components), or other reasons, this is not always feasible. And what happens when I have multiple teams and multiple components, each with their own integration schedule or "rhythm" and needing to coordinate with a larger-grained system integration strategy? There are many dependent factors to consider, including:
- Relationships between subteams, and their respective integration "rhythms",
- Build-time dependencies between components (libraries, APIs, etc.)
- Geographically dispersed teams and team-members (and differences in time zones)
- Repository size and performance
- System-build performance and available network resources, etc.
System Integration for Multiple Component Teams
Many large projects and systems require multiple teams of people to work together. In component-based development and elsewhere, it is common practice to see a system partitioned into multiple subsystems and/or components, with a team allocated to each component of the larger system (a component team [1]):
- Each component-team develops a separately buildable part of the overall system, and typically develops and modifies only the source for their component (the component team takes ownership of the component's source code)
- Each component may be used or reused in one or more products of the overall delivered system (or within a product-family).
- Some components may require delivery and distribution of their source code in order to build other components; while other components may require delivery and distribution of only binaries, or binaries and interface definitions (e.g., header files in C/C++).
- The resulting delivered component versions can then be assembled together into the final system (subject to system and integration testing of course).
Each team should be responsible for ensuring it delivers working code and executables to other internal stakeholders. When a Two-Phased Codeline Commit protocol is used, each Task-Level Commit to the codeline is essentially signing and sealing that you have successfully compiled and tested the entire component with your changes incorporated.
If the team is large or dispersed enough to actually warrant "subteams", it may become necessary for each component-team to deliver "tested, closed, sealed and signed, versioned binary libraries" to the rest of the component teams. If the repository contains several components, and people build only their components to test their changes (rather than the entire system), then they are ensuring they have "tested, closed, sealed and signed" deliverables only for that one specific component!
Using a Staging Area
A common best practice used to coordinate cross-component build dependencies is commonly called a "Staging Area" or "Staging Environment". A Staging Environment is like a "sandbox" or "workspace" reserved for sharing build/test dependent artifacts (headers, libraries, executables, etc.). It works something like this (note, this is not specific to XP/Agile development):
- Each subteam "does its thing" and builds and compiles its code as they should and commits changes to their repository in the usual fashion
- At agreed-upon points in time, the subteam "delivers" any artifacts that other teams require in order to build (headers, APIs, libraries, configurations, etc.) into the staging area (and ideally some additional level of build/test is done)
Figure 1. Delivering Built Components into a Staging Area
- When building in one's private developer workspace or even a subteam's integration workspace/machine, the compilers and linkers (etc.) point to the "official" staging area for the non-subteam owned artifacts needed for the subteam to build their component.
Staging AreaIf versioning is required, then it is typically handled one of two ways: Staging AreaStagingRepositoryStaging Area
Staging Area Implementations and Versions
The issue of versioning comes into play if it is necessary to know which version of a component is the "current" one in the staging area. If necessary, then:
- When a subteam build has its executables delivered to the staging area, it also creates a corresponding tag/label, and perhaps writes it to a file (e.g., README or whatever) for that component in the staging area (a simplified form of "version description document" or VDD).
Staging Directory: a separate directory tree in the repository is used to house any "installed" staging artifacts (staged artifacts). Developers will typically use the staging area plus the top-level directory for their own components in their sandbox (and don't extract/checkout anything else unless and until they need to view the source for something outside their owned component).
Figure 2. Staging Directory Versioned in same Repository as Components
Staging Repository: a separate repository is used to house all staged artifacts. It can therefore accommodate separate sets of versions and tags/labels (which has good points and bad points)
Figure 3. Staging Repository and Separate Component Repositories
Sometimes granularity of access-control, administration, mirroring/synchronizing will determine which of the above two approaches is best. If each component is large enough to already warrant its own separate repository from the others, then a separate staging repository is typically used.
Sometimes, for local performance, a staging area might be mirrored or replicated to local sites/storage to cut-down on network bandwidth for their build-cycle time.
Making it "Agile"
The Staging Area is a specific technique for separating (sub)team build-dependency interface from implementation for the benefit of the rest of the team(s). A Staging Environment is the component-version "mediator" (coordinator really) that houses the common interface and necessary artifacts to satisfy build/test dependencies across subteams.
How might we apply an "agile" adaptation of it? Well, the "simple" case is when no separate staging area is needed because the whole "one team" can peacefully co-exist in "one repository" and each work at their own "sustainable pace" without unduly impacting the others. So there is little need to think so much about subparts and subteams and instead more easily focus on "the whole"
Other times, factors of scale rear their ugly head! These may be issues of system/build scale, organization and organizational process, issues of ownership over computing resources, etc. (or maybe not all the subteams are using "agile" and some of them can't tolerate such high-frequency of changes/deliveries from the agile-teams into their own part of the repository.
One of the key problems to solve is when and how-often a subteam should do a "signed and sealed" delivery into the staging area. If every commit to the codeline is too frequent for a staging delivery, then an arrangement must be negotiated with the other subteams. This is where some agile methods try to "scale" by using a team of teams (e.g. a "Scrum of Scrums") to manage the staging frequency and coordination.
Scaling Continuous Integration up to Continuous Staging
If it is necessary to "scale-up" my build process & resources to use a Staging Environment, how might I "scale-up" a practice like "Continuous Integration" to approximate "Continuous Staging" into the staging area? This would avoid, or at least minimize the need to tag/label every delivery into the staging area, and hence minimize the need to manage build-version-dependencies between components and the subteams that work on them. Even if it were no longer practical to use a single repository or full system build for every commit across the whole team, it might be feasible to:
- Have every commit (or even just once or twice daily) "trigger" a delivery to the staging area.
- Another trigger (or perhaps the same one) detects an update to the staging area and does the next-level of build/link/test (automated of course) using the current set of items in the staging area.
- If it breaks, you take the appropriate course of action and notifications (just as one does for Continuous Integration at the smaller scale)
Even in those cases where you might still need to "version" the repository (and component) for what you delivered to the staging area, the staging area itself can be used to manage the current latest and greatest set of "system buildworthy" components and their versions (both source and binaries).
Component Versioning and Releasing
If the development of all your components results one coordinated release of a single application or system, then it maybe best to version the source files, not the compiled libraries. Even when "versions" are associated with what gets delivered to the staging area, they typically refer to versions of the source that produced the "staged" deliverables.
If however your result is really an overall product-family or product-line of multiple components that feed into multiple products for multiple deliverable systems, then the component/library reuse and independent component release schedules may make it necessary to version the binary/library releases.
In the latter case of a product-family, each component release is essentially a release of a third-party component to each of the other component teams. The vendor release in this case originates from elsewhere within the organization (rather than an external supplier), but the underlying business model for reuse and release of components matches that of third-party vendor/supplier (albeit an internal one).
There are truly external vendor and third-party deliverables, and then there are items that may be internal to your organization, but should be regarded as internally vendor/3rd-party supplied to your particular product and team. And in those cases versioning the delivered binaries is recommended.
Some shops use a separate Third Party Repository for such purposes. One of the reasons is because its supplier and release schedule is independent of the rest of the application. Another reason is that if most of the elements are binary in nature, it is often desirable to have a distinct storage area with more efficient storage parameters/capacity (and sometimes the repository can be configured so it is "tuned" for performance based on knowledge of the kinds of elements it will predominantly store).
And of course, if you get code delivered from any of those third parties, you probably want to version it along with the delivered binaries unless the binaries can be reproduced from the code you were given (sometimes the source delivered is insufficient for that).
If you have to modify any of that code from the third-party for your own custom, value-added purpose, you will probably want to use the Third Party Codeline pattern (plus, get them to incorporate your changes, unless the organization deems them proprietary and is unwilling to submit them back to the vendor).
References
- [1] Object Solutions: Managing the Object-Oriented Project, by Grady Booch; Addison-Wesley, October 1995
- [2] Continuous Integration - Just Another Buzzword?, by Steve Konieczka, Steve Berczuk and Brad Appleton ; CM Crossroads Newsletter, September 2003 (Vol. 2, No. 9)
- [3] Codeline Merging and Locking: Continuous Updates and Two-Phased Commits, by Brad Appleton, Steve Konieczka and Steve Berczuk ; CM Crossroads Newsletter, October 2003 (Vol. 2, No. 10)
- [4] Build Management for the Agile Team, by Steve Berczuk, Steve Konieczka and Brad Appleton ; CM Crossroads Newsletter, November 2003 (Vol. 2, No. 11)
- [5] Software Configuration Management Patterns: Effective Teamwork, Practical Integration; by Stephen P. Berczuk and Brad Appleton; Addison-Wesley, November 2002
Acknowledgements
- Jeff Grigg, from the extremeprogramming mailing list.
Brad Appleton is co-author of Software Configuration Management Patterns: Effective Teamwork, Practical Integration. He has been a software developer since 1987 and has extensive experience using, developing, and supporting SCM environments for teams of all shapes and sizes. In addition to SCM, Brad is well versed in agile development, and cofounded the Chicago Agile Development and Chicago Patterns Groups. He holds an M.S. in Software Engineering and a B.S. in Computer Science and Mathematics. You can reach Brad by email at brad@bradapp.net
Steve Berczuk is an Independent consultant who has been developing object-oriented software applications since 1989, often as part of geographically distributed teams. In addition to developing software he helps teams use Software Configuration Management effectively in their development process. Steve is co-author of the book Software Configuration Management Patterns: Effective Teamwork, Practical Integration. He has an M.S. in Operations Research from Stanford University and an S.B. in Electrical Engineering from MIT. You can contact him at steve@berczuk.com. His web site is www.berczuk.com
Steve Konieczka is President and Chief Operating Officer of SCM Labs, a leading Software Configuration Management solutions provider. An IT consultant for 14 years, Steve understands the challenges IT organizations face in change management. He has helped shape companies' methodologies for creating and implementing effective SCM solutions for local and national clients. Steve is a member of Young Entrepreneurs Organization and serves on the board of the Association for Configuration and Data Management (ACDM). He holds a Bachelor of Science in Computer Information Systems from Colorado State University. You can reach Steve at steve@scmlabs.com
Trackback(0)
|