Why is it that we keep revisiting configuration management “best practices”? It is not that they are not well covered. It is because they keep changing and every time we look at the process of development from a different perspective, we learn something new. Over the last few years, I have found that what we do under the guise of CM differs in how we identify the problems to be solved, how we address those problems and even what tools are appropriate.
This month, there were several areas that have not really been covered from the viewpoint of how they impact CM that I have been dealing with either personally or with others in the CM industry as we try to help each other. These are software architecture, branching and merging, hardware, and documentation.
Software architecture is a combination of the framework the rest of the software operates under and the directory (package) hierarchy the software resides in. This is the part of a software system that is least flexible in that it tends to need backward compatibility and a consistent, documented, stable interface specification. And yes, I use the “s” word even in agile shops. Imagine just how difficult it would be to develop software if the underlying language kept changing – sometimes between builds! This is what it is like to develop software where the core functions keep changing.
So how does this work in an agile world? Several people have commented recently that being agile does not mean no long term thinking; just no long term scheduling. There seem to be two practices that work:
- Determine the architectural needs for user stories that are at least one to two sprints later and schedule them so that they are done well before they are needed. The interface documentation needed by developers and testers should be done no later than the one sprint cycle before it is needed so it is available for planning.
- First, break the product backlog into functional and architectural requirement, where functional requirements are generally represented as user stories. As user stories are prioritized, they are dependent on architectural requirements that also have to be included into a sprint. This means that the initial sprints tend to consist of fewer user stories and more architectural tasks.
Personally, I like the first approach better. It allows the “architecture” to be treated as a separate product the user stories are dependent on. No user story can be considered for a sprint unless its dependencies, at the architectural level, are satisfied. By defining what is needed for each user story and taking into account their desirability (think implementation priority), it is possible to prioritize the backlog for the “architecture” and develop it using agile methods too.
Regardless, the way software is physically architected can have a serious impact on the problems experienced within a SCM framework – especially the version control component. If software is modular and hierarchical in nature, in other words not pathological or “spaghetti,” then parallel development is easier. There will likely be fewer conflicts that require merges since, most of the time, changes will be focused on isolated features. When this is not the case, merges often take far longer than planned and produce results that need to be almost completely regression tested. Some of the code that is common to multiple “functions” will artificially appear to be “fragile,” leading to unnecessary refactoring or possibly even unnecessary rewrites.
Branching and Merging
There are really only two reasons to branch: to isolate changes or to create variants. The first is the more common case as it is used to support release (maintenance) branches, parallel development branches, feature branches, etc. In multiple team agile development, each sprint is generally assigned its own branch (for purposes of this article only, consider streams to be branches) under a common parent. What has not been standardized (yet) is whether bug fixes occur on the parent or on one or more branches. Personally, I have found that doing bug fixes on the parent is yields a higher quality end result due to less merging. If a sprint completes, the branch is merged back into the parent. In the case of multiple concurrent sprints, they should be merged in one at a time to minimize confusion and to hopefully allow sprint-level regression testing after each merge.
This pattern of sequential merges causes its own problems. One of the reasons for doing agile development in the first place is to minimize lost time, so having merge times and sequences dictate scheduling is not really acceptable. The figure below illustrates a simple sequential merge on the left and a similar cascading merge on the right. Cascading merges use intermediate integration branches where no work is actually performed to allow concurrent merges to take place. In this example, all of the branches could actually be initiated from the same point and the decision as to which sprint branch merges to which integration branch can be deferred to merge time. Actually, the integration branch creations themselves could be deferred until they are needed, so long as they are initiated from the same starting point as the sprint branches.
There is a third pattern of branching and merging, shown below, that allows for continuous change integration. This pattern should only be used when it is guaranteed that the changes will not be reverted. In other words, when any problem that occurs either as a result of a merge or an issue on a working branch will be fixed instead of abandoned. Even though the illustration shows both working branches merging into the CI branch, it can be that after each working branch merges into the CI branch, the updated CI branch is then merged back into each of the working branches. How and when merges occur depends on the Version Control tool being used. Tools which inherently “know” what changes have been merged in already make the use of this pattern feasible.
Variants are a much simpler case. This is where customer or platform-specific branches are created. Again, it depends on which version control tool is being used as to exactly how this performed and how difficult it is, but basically all you are doing is making variant-specific changes on the variant branch and generic changes on a “generic” branch (often the trunk). If “sparse branching” is supported, it is the view that combines the two branches that yields a buildable codebase. If only “full branching” is supported, the generic branch must be periodically merged into each variant branch.
Hardware CM is both simpler and more complicated. It is simpler in that if there are branches at all, they tend to be of the variant pattern and merging as such is generally impossible. It is more complicated in that you tend to version specifications, vendor lists and part numbers in something that ends up looking like a hybrid between version control and build management. The typical bills of material used for constructing a product are very similar to build records and the list of possible component choices is like a dependency-style build engine.
If you keep track of the individual component specifications as well as the dependencies between them, there can be very large range of possible final configurations. For example, say that there are four parts used to build a widget. Part A can be from vendors one through five. Part B can be from vendors one, two, or six, but if vendor two is used, part A can only be selected from vendors two or five. Part C can be from vendors three, five, six, or seven, but if vendor three is used, Part B can only be from vendor six. And finally, Part D can be from Vvndors one through three or nine (vendor eight was recently disqualified), but if vendor nine is used, Part C can only be from vendor three. This not only becomes very complex very quickly, it means that the resulting build, especially if the parts are selected manually, must be validated as a part of the final inspection before a product can be approved for shipping.
Imagine the complexity of building an aircraft! Everything, down to the individual washer, must be accounted for, tracked and checked. When life-critical devices such as pacemakers or drug pumps are considered, even the software used needs to be tracked at a “hardware” level using its part number and release version information.
Documentation - Doing and Controlling
When one starts dealing with documentation, there is always the challenge of knowing what to include, where to put it and whether to control it. There are so many types of documents these days: requirements, interface specifications, user stories, user manuals, online help, program documentation, build records, and the list goes on, that knowing what to put in each of them (especially without repeating content) is something that needs to be planned ahead. When each new class of document is identified, it needs to be factored into the overall planning.
As far as knowing where to put each document, think more in terms of a retrieval system than a storage system. Most people think in terms of storing documents in hierarchies, but when trying to find information on a topic we end up reverting to search engines instead of doing a logical breakdown based on content. Of course, if you have a good search engine that can access documents in a version control or document management system, it may not be as big a problem.
As to whether to control or not control a class of documents, it depends on what they intended for. If they tie to specific versions of source (or other component or product that is released multiple times), or if regulatory requirement mandate it, then they should be versioned. If they are working documents (think project plans, schedules, etc.), then they maybe should be versioned. This becomes a joint decision between CM and management. Anything else should be addressed on a case-by-case basis. However, once a decision has been made it should not be changed.
Even though these categories have been discussed often and many book have been written, do not be afraid to look at the world of development from fresh perspectives. This often illuminates why problems have been experienced in the past. People that are new to CM, or those that have been using one development paradigm for most of their working lives, may not have the time or breadth of experience to see if the way things are developed is affecting their CM world. This is the reason CM Crossroads is such a valuable resource – use it!
It is also seems true that CM is rarely invited into the early planning phase of a product's development where they could provide insights into how to make the development process easier on development. I wish I could give a pithy or succinct pointer on how to resolve this, but I have been fighting it for more than three decades and it seems to still be one of the more common questions asked. All I can suggest is trying to capture some statistics on the time lost due to “unnecessary” merging and duplicate documents (or replicated document “pieces” that have to be independently maintained) and use them to convince management to involve CM before the next product is started.
Finally, while it is true that you can use software CM tools to perform hardware CM, it is generally not practical. Just like management likes to “right size” personnel, use the right tools to do the job.