Featured Whitepapers
- Forrester Research: Optimizing Globally Distributed Software Development Using Subversion
- An Integrated Approach to Requirements and Quality Management
- Continuous Testing With ElectricCommander
- Agile CMMI at a Large Investment Bank
- Realize Effective Distributed Development Via a Virtual Software Factory
- Build & Deployment Automation for the Lean Economy
Upcoming & Recent Webcasts
- A New Kind of Engineering
- Managing Change in Rugged COTS Systems Development
- Keeping Control of Costs and Schedules When Requirements Change
- Three Simple Things that Will Help You Adopt Agile in Your Enterprise
- Customer speak: Teams, Insights, Results with Quality Driven Software
- Build & Deployment Automation for the Lean Economy
Principles of Agile Version Control: From OOD to POB |
| Print | |
| Written by Brad Appleton, Robert Cowham and Steve Berczuk | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Wednesday, 19 July 2006 02:46 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Last month we looked at several principles of object-oriented design and tried to translate them, into principles of version-control for task-based development with workspaces, changes, and baselines. This month we extend our "translation" efforts from OOD principles to codelines, branching and promotion. The result is that we expand our scope from task-based development (TBD) to project-oriented branching (POB).
As we wrote last month, the principles of object-oriented design [1] address the need to minimize the complexity and impact of change by minimizing dependencies through the use of loosely coupled and highly cohesive classes, interfaces, and packages. Classes, interfaces and packages are the logical containers of software functionality. These logical entitities are realized in "physical" containers of version control (configuration elements) in the form of files, libraries and components, and are as follows:
Last month we derived from the above several principles related to version control containers in general. This month we wish to go a step further and apply those translations to changes/workspaces, baselines, and codelines to arrive at a concrete set of principles for version control. As a refresher, when translating the OOD terms to version control terms, we concluded the following:
Now lets use these to derive and explore principles of Task-Based Development and principles of change-flow across multiple codelines to support Project-Oriented Branching, which is a natural extension of task-based development at the next level of scale and visibility. General Principles of Container-Based Versioning First let's review our more general translations from last month, and refine them a bit (and give them some names). The ones that are still too vague for our taste won't be listed as general principles (but will still be applied to derive more specific ones).
Note that the CEP is simply the DRY Principle being used to create the version-control equivalent of a "class." The "container" is the unit of encapsulation for our purposes. This is what allows us to apply all the other OOD principles to the domain of version-control. The CBDP is the equivalent of dependency-inversion. The container, as the unit of encapsulation, is the thing that we should be depending upon, instead of directly referencing the content or context of the container. The ADP is practically unchanged from its OOD incarnation, and almost not worth repeating, except that dependency management plays an important role in CM. IDIP is a rather interesting translation of the Law of Demeter (LOD) and bears some further explanation ... Evolution Insulation for Unique Identification The more specific form of the Law of Demeter states that: An object method should invoke the methods of only the following kinds of objects: a) itself; b) its parameters; c) any objects it creates/instantiatess; d) its direct component objects. In particular, an object should avoid invoking methods of a member object returned by another method. In version-control, the "containers" we use don't really have "methods: beyond the operations provided by the version control tool. However, there are a couple of things that we end-up doing with the tool that come pretty darn close to being "methods" or "services" provided to the rest of the CM system and its users: identification and status-accounting. Identification of a container is the unique name that we give it in the version repository. These would be the names of things likes workspaces, codelines & branches, version labels/tags, and change-sets/change-tasks. We usually try to provide descriptive names for this types of containers that will mnemonically suggest their intended meaning and purpose. When any of these names are used iin any long-lived queries, metrics, or reports and are used to represent unique keys (in a database) of data that is used when generating such queries, metrics or reports, then it can wreak havoc when one of these names needs to be changed for some unforeseen reason. Certainly any queries, metrics and reports should "program to an interface, not to an implementation" when making use of our containers and their data; And in order to avoid similar types of problems, the names of the containers they do use to acces that data need to be treated in a similar fashion. This gives rise to an equivalent of the Law of Demeter for configuration identification:
What this means is that the names of things like workspaces, changes/tasks, codelines, etc. should be named after their content and/or their intent. The identifier used should not attempt to describe any context information or association which might change during the lifetime of the object. Sometimes it is easy to go overboard with identifying more information than is absolutely necessary in the name of a thing. If the name we give it somehow implies an association with some other object, then it creates an undesirable form of dependency if that association does not last. It creates, at the very least, and inconsistency between what the name suggests, and the actual relationship. And if that object and its name is frequently referenced by others, or in other databases, repositories, queries/reports, or scripts, then such an inconsistency can actually break things in the SCM environment. For example, if a task-branch is used, and if the task is intended to be targeted to a particular release, many may include that release-name in the task branch. But if there is any real possibility that the change might go into a different release instead, then the name would no longer be accurate; Changing the name at that point could wreak havoc if the name was used as a unique identifier of the change in other repositories, databases, reports, etc. Similar problems often arise in an organization's document naming conventions, especially if document names contain organizational identifiers and the organixation frequently undergoes renaming or restructuring. Principles of Task-Based Development
The first two of the above principles (STWP and CHIP) are derived from the Single Responsibility Principle (SRP). Here's how ....
The STWP is basically a statement about multi-tasking. It says a worker in a dedicated workspace should work on only one thing at a time within that space (to avoid the overhead, complexity, and interruption of flow that results from frequent context-switching between multiple tasks).
The CHIP is basically saying that development changes should be transparent to the extent that they are intention-revealing. A change is transparent if:
This might be accomplished by associating the change with a record in a tracking "system", such as a feature ("story"), fix or enhancement. There might be other ways to accomplish the purpose of change-transparency as well. Last month, we concluded that the OCP translated to the concept of preserving the identify or essence of its container. What does it mean to "preserve the essence" of a change? Preserving the "essence" of change means that if we say a change is in the codeline, we have a way of knowing and showing that its "in there" (both functionally and physically). We must have a way of detecting and identifying its presence, content, and behavior:
The CHAP goes a step further than the CHIP. The CHIP simply says a change needs to reveal its intent; The CHAP says a change needs to "prove" its existence in form, fit, and function! CHIP is a promise that the change is authorized and planned; CHAP verifies that the promise was fulfilled! Taken together, the CHIP and CHAP are about establishing trustworthy transparency for a change/task. We do this by being able to repeatably and reproducibly demonstrate and report the request, status, and results of a change, as well as the "who+what+when+where" of a change: the former is often stored and managed in the tracking system, and the latter is typically captured within the version-control repository when the changes are checked-out and in. The principles of package cohesion (REP, CRP, and CCP) all seem to point to the notion of a change as a single, atomic unit:
The CHTP is suggestive of patterns like Task-level atomic commits. And last but not least we have the ISP translating into the ever important (and almost too fundamental to mention) principle of incremental development, even for tasks:
In order to be client-valued, the increments need only to add some portion of requested functionality that can be successfully built and tested. The presumption here is that the increments may be within a task. But this principle doesnt make that very clear, nor does it make it clear if change-increments should be separately integrated or not. For this reason, we feel that while the above is certainly important, it's not quite specific enough to claim it is a version-control principle, much less a TBD-specific one! Principles of Baseline Management
One of the first priniciples we considered here was some sort of "Single Configuration Principle" that would be an application of the SRP to baselines: a baselien should correspond to one and only one configuration. The single configuration might include other configurations, but the its intent is to represent a single configuration for a single purpose. This just didnt seem to be significant enough to warrant having as a principle. If you disagree, please email us and tell us why (we are eager for your feedback). For baselined versions, identity preservation (our translation of OCP) means that if I "baseline" a version and label/tag the result, then that resulting version name should always correspond to that same set of file versions! It's not okay if it refers to version 10 of file helloworld.c one day, but refers to version 11 of the same file a week later. If I "baseline" a version, it means I'm planning on handing it over to some consumer using that version name, and making a promise that the version name will forever correspond to that very same content every day thereafter for as long as the project & product is still in use. This gives us ...
Damon Poole's Timesafe Property [7] makes a very similar statement, but is more focused on the property of an SCM system that is necessary to preserve a baseline's historical accuracy. It assumes that BLIP is already a "given" and discusses a necessary mechanism to ensure it. From the perspective of dependency management, BLIP is important because it makes a statement about the dependability of baselines and the dependency upon the immutability of the contents associated with the baseline:
Last month, we translated ISP to be defining fine-grained acceptance criteria that are client-specific. For configurations (including changes and baselines), this means that the promotion-lifecycle should define its promotion-levels based on acceptance/readiness criteria for transitioning from one level of user-visibility to another.
One can define numerous levels of quality assurance, but the ones that are important enough to represent new milestones in the evolution of a configuration are when that configuration has successfully transitioned to the next-level consumer in the value-chain! Lastly, for baselines, our interpretation of the package cohesion principles results in equating the granularity of baselining, integration, and even promotion:
Principles of Codeline Management
Our interpretation of the Single Responsibility Principle for Codelines is:
Whereas STWP was about the act making/creating change, SCP is about the act of integrating change and synchronizing the contents of a container. SCP says that while it is perfectly normal for multiple people to work on many different things during the time, only one source at a time (a workspace or another codeline) should be allowed to transfer changes to our codeline. There should be no concurrent commits attempting to update the same codeline for the same component at the same time! In practice, strict adherance to SCP might be controversial, particularly to those claiming it imposes a severe bottleneck on the flow of development changes when a large number of people are working on the same codeline, or a lot of very small non-overlapping tasks are happening during the same period of time. If we do decide to violate SCP however, we will definitely "pay the price" either by accepting greater risk, or else imposing other process restrictions and rules (or even creating other codelines) to mitigate that risk. Next up is OCP. It was pretty easy to apply to baselines, but what about codelines? When a change is made to a codeline, it results in a new "current configuration" of the codeline. This new configuration then becomes the basis for all collaborative work stemming from (and eventually integrated back into) the codeline. If the codeline is "broken" as a result of this change, then it breaks for everyone else that (re)uses that configuration. If that happens, we have just disrupted the flow of collaboration and progress for the codeline. We don't want that! So the essence of a codeline is its "flow"; and we wish to preserve the steady flow of collaboration and progress contributing to the value-stream that the codeline represents:
The BLIP was about keeping baselines frozen, but the CLFP is about keeping codelines flowing! Laura Wingerd developed an equivalent rule to the CLFP called the Golden Rule of Collaboration: "Always accept stabilizing changes; Never impose destabilizing changes." [8] We achieve this by establishing a "codeline invariant": a set of collaboration criteria/constraints that the codeline's users can rely upon to be preserved. This is often done using a Codeline Policy [2]. The codeline invariant specifies the required degree of 'C'-worthiness (correctness, completeness, consistency, and cadence) of changes flowing through it to enable the needed amount and rate of collaboration and progress. Last month we interpreted the LSP as a statement of Evolution Integrity, but we need to determine what a "derived container" is. Within the flow of a codeline, the current configuration is "derived from" it's predecessor configuration, as suggested by Anne Mette Hass in [9]. So derived "versions" of a codeline should be substitutible for their base versions. This means committing a new change to the codeline must consistently preserve the "correctness" of the codeline. Since the context of the codeline will be "reused" by the next task or workspace-update, each committed configuration must be no less correct or consistent than it's predecessor.
For many development shops, this is just a fancy way of saying "Don't break the build!" The basic tenet is to ensure that each change adds value to the codeline without compromising its quality or steady flow of progress. The way that ISP applies to codelines is regarding integration milestones within a single codeline. This principle is so commonly known to so many that it seems almost too obvious to bother saying, but it nonetheless needs to be said:
Many of us know the perils of "big bang integration." But its difficult to know just how frequently and incrementally we should integrate our work and make it visible to "higher" levels of the enterprise. The IIP gives us some general advice, but no specific recommendations. Perhaps the principles relating to evolution granularity can help us? When applying package cohesion principles to codelines, we arrived at:
The CFLIP says that collaboration is the source of value-generation (it constitutes both the source of change and reuse) and that the collaboration for a change is not finished ("closed") until it is built+tested (intregrated) so as to be releasable (ready for reuse). Principles of Branching & Merging
For our interpretation of the LSP, another form of "derivation" in version control is when a new branch or codeline is "branched" from a particular point off of it's "parent" codeline. The branchpoint is the baseline (or foundation) configuration for the new branch. The new branch "inherits" all content from its parent codeline while being allowed to evolve separately, and (unless it is a permanent variant) usually diverges from its "parent" or "base" codeline for a limited period of time and either promotes or else propagates changes back to its parent at periodic intervals. At the time it merges back into its parent, then the CLIP mandates that the integrity of the base/parent codeline must once again be maintained. So regardless of whether the child branch is for maintenance purposes or for new development, the parent codeline should be longer-lived than the child, and the child should merge back to its parent (possibly multiple times as well as at completion):
The CLNP actually conveys fundamental advice about the branching structure of a component or product. It says to use a simple, recursive hierarchical structure, similar to the preferred control-flow structure (and identation format) of statements in a computer program. The CLNP suggests avoiding the continual cascading/staircase style of branching and instead points us in the direction of the Mainline pattern for organizing our branching structures using nested synchronization ("mainlining") [2][10]. The CLNP also seems to share much in common with the portion of Laura Wingerd's Base-Codeline Protocol which says that changes should always flow from child codelines to their base/parent codelines [8]. The other half of the base-codeline protocol addresses the main difference between the two scenarios above: changes flow from parent to child only when the child is for new development (and not when it is for legacy maintenance). This is necessary to maintain the integrity of both codelines while also preserving the relationship between them. The CLNP doesn't go quite this far, but perhaps one of our later principles will. So if changes should always flow from child-codelines to their parent-codelines (CLNP), then when, if ever, should changes flow in the reverse direction (from parent to child)? In general, we have three basic kinds of codelines and change-flow:
So a codeline basically represents an effort to either maintain the past, coordinate the present, or develop the future. And change-flows either propagate stabilizing change, synchronize shared progress, or promote new value. Note also that both propagation and synchronization are basically updating previous context while promotion is creating new content! The more future-looking ("progressive") codelines should be synchronized from their more "conservative" parents, but the opposite is not true: more conservative (e.g., maintenance) codelines should not be synchronized from the more "progressive" parents. Given our earlier translation of "abstractness" to "conservative", this would seem to be a valid interpretation of "dependency inversion" as applied to codelines and change-flow:
As we might have expected, this is almost the exact equivalent of the second half of Laura Wingerd's Base-Codeline Protocol [8]: changes flow from parent to child only when the child needs to be more stable than the parent. The PSP uses the term "more conservative" instead of more "stable" or more "safe." Are stability and "conservativism" the same thing? Not according to our "domain translations" from before: we equated stability with "safety" and abstractness with "conservativism." But the very last OOD principle is the Stable Abstractions Principle (SAP), which we'll discuss a little later on. Similarly, for codelines, there may be many different temptations to branch off a new codeline, but the ones that are significant enough to truly warrant a separate codeline are when the new branch either:
A prime example of adding/preserving value would be a maintenance branch to support a legacy release. Here it is assumed there is either additional support revenue to be gained, or significant business-loss to be avoided (enough to make it worthwhile to support and maintain the new codeline). An example of preserving flow would be the Release-Line and Active Development-Line patterns. Ideally, one could get away with using only the release-line, but the difference in audience and criteria for the active-line is enough that it would significantly disrupt the flow of the parent release-line, and/or the parent's collaboration requirements would significantly impose too much development friction against the flow of the child development-line. So, for codelines, the ISP gives us some rules for when it is appropriate to create a new branch off of the "current" codeline:
Here, "go with flow" simply means that the existing codeline invariant (its required levels of correctness, consistency, completeness, and its cadence) can't meet the needs of both sets of prospective codeline users. So it needs to split by branching off a separate stream of "change-flow" for the two competing/conflicting sets of users. This would seem to be a restatement of the "branch on incompatible policy" rule from [11]. Other applications of this are evidenced by Task Branches and Private Branches for in-progress changes and exploratory/experimental work, or multiple levels of "active" development lines that may correspond to work being performed and integrated across multiple sites and timezones. The package coupling principles of OOD translate almost directly to version-control with only slight modification (sometimes "dependency" translates into "flow", but note that change-flow does not imply dependency). We alsready covered the ADP, so we'll cover the other two here:
In Robert Martin's book [1], when discussing the ADP, SDP, and SAP, he describes some measures of stability and abstractness, and plots the stability. He then describes what he calls the "main sequence", which is the line through the origin identifying all points of equal abstractness and stability. When we translate these three principles into version-control terms, the so called "main sequence" seems to directly correlate to Laura Wingerd's "Tofu Scale" of the relative "firmness" of a codeline [8][12]. Conclusion We hope that these ideas (principles) and discussions prove useful to the reader. We are very interested in feedback on any of these principles (especially the names and accuracy of the "translations" from OOD terms to version-control terms). This set of principles is still a "work-in-progress" and we of course reserve the right to modify them as our understanding of them matures. References [1] Agile Software Development: Principles, Patterns, and Practices; by Robert C. Martin; Prentice-Hall, 2002. (See related essays online) [2] Software Configuration Management Patterns: Effective Teamwork, Practical Integration; by Stephen Berczuk and Brad Appleton; Addison-Wesley, November 2002. [3] A Software Configuration Management Model for Supporting Component-Based Software Development; by Hong Mei, Lu Zhang, Fuqing Yang; ACM SIGSOFT Software Engineering Notes, Vol. 26, Issue 2; (March 2001), pp. 53-58; ISSN:0163-5948 [4] A Component-Based Software Configuration Management Model And Its Supporting System; by Hong Mei, Lu Zhang, Fuqing Yang; Journal of Computer Science and Technology, Vol. 17, Issue 4; (July 2002), pp.432 - 441; ISSN:1000-9000 [5] Container-Based SCM and Inter-File Branching; by Laura Wingerd; 1st BCS CMSG Conference, April 2003 (also see accompanying presentation) [6] Flexible Configuration Management for a Component-based Software Asset Repository; by Tom Brett; BCS CMSG event: Why Software Asset Management and Configuration Management is essential, March 2004 (also see accompanying presentation) [7] The Timesafe Property: A Formal Statement of Immutability in CM; by Damon Poole Wingerd; submitted to the 8th International Symposium on System Configuration Management (SCM-8) in Brussels, Belgium, July 1998. [8] The Flow of Change; by Laura Wingerd; presented at SD West 2005 and the 2005 Perforce User's Conference. [9] Configuration Management Principles and Practice; by Anne Mette Hass; Addison-Wesley, December 2002. Chapter 1, "What is Configuration Management?" (available online) [10] Streamed Lines: Branching Patterns for Parallel Software Development; by Brad Appleton, Steve Berczuk et. al.; Proceedings of the 1998 Workshop on Pattern Languages of Program Design. [11] High-level Best Practices in Software Configuration Management; by Laura Wingerd, Chris Seiwald; Proceedings of the Eighth International Workshop on Software Configuration Management (I-SCM8), Brussels, July 1998; (also presented at the 1998 Perforce User's Conference, June 1998) [12] Practical Perforce: Channeling the Flow of Change in Software Development Collaboration; by Laura Wingerd; O'Reilly & Associates, 2005. Chapter 7, "How Software Evolves" (available online) Brad Appleton is an enterprise SCM/ALM solution architect for a Fortune 100 technology company. Currently he helps projects and teams adopt and apply agile development & SCM practices. Brad also author's the Agile CM Environments blog, and is co-author of Software Configuration Management Patterns: Effective Teamwork, Practical Integration, the "Agile SCM" column in CMCrossroads.com's CM Journal, is a regular contributor to "The Agile Journal", and is a former section editor for The C++ Report. Since 1987, Brad has extensive experience using, developing, and supporting SCM environments for teams of all shapes and sizes. He holds an M.S. in Software Engineering and a B.S. in Computer Science and Mathematics. You can reach Brad by email at brad@bradapp.net Robert Cowham has been in software development for over 20 years in roles ranging from programming to project management. He continues his involvement in development projects but spends most of his time on SCM Consultancy and Training. He is the Chair of the Configuration Management Specialist Group of the British Computer Society, has a BSc in Computer Science from Edinburgh University and is a Chartered Engineer (CEng MBCS CITP). You can reach him by email at rc@vaccaperna.co.uk
Set as favorite
Bookmark
Email this
Hits: 9771 Trackback(0)Comments (0)
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Last Updated on Friday, 21 July 2006 06:27 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||


