Sponsors

Microsoft


TechWell

We have 2392 guests and 5 members online

Home Articles Articles Continuous Dis-Integration

Continuous Dis-Integration

PDF Print E-mail
Written by Ben Weatherall   

BehavioriallyPretty much everyone is now familiar with Continuous Integration (CI) builds. Fewer are aware of the distinction between those CI builds and Continuous Integration itself. And even fewer are aware of the pitfalls that can await the unwary when they venture down this path, for it is fraught with unseen perils that can lie dormant until one least expects them. CI can be an adventure story to rival the opening of the American West or Australian Outback.

What is Continuous Integration?
Continuous Integration is a process of incrementally merging fixes and features upstream as they become “suitable” for inclusion in a release. The diagram below illustrates simple Continuous Integration where fixes or features (from here on referred to as change sets) are worked at the “Sprint” level and then integrated into the “Integration” level. In this first example there is no feedback from Integration back to Sprint, so any and all conflicts have to be resolved at the Integration level. Of course, these levels are symbolic representations of branches or streams. Note that only after a change set is complete and tested is it promoted upstream to the the Release level where it is available for inclusion in the next release. By complete, I mean that there are is no dependency between the change set being promoted upstream from Integration and any of the other change sets that have not yet been promoted.


The biggest problem with this approach is that as time goes by, each of the Sprints gets further and further out of sync with each other and with Integration. Consequently, the promotions upstream become more and more difficult over time. This all but requires that the Sprints are “rebased” (brought back into sync) periodically from Integration.

A modification of this approach is illustrated in the diagram below where, after any conflicts are resolved, the resulting change set(s) are brought back into the Sprints in order to keep them roughly in sync with Integration.

Note that in this example, the promotion of the second change set has conflicts that are resolved in Integration, then flow back down to the originating Sprint as well as the others. This means that even though the Sprint promoted a change set that worked with their codebase, it may well require merging and repair when it is returned.

2

This drift from a common codebase seems to be accepted these days, but from both a quality and an SCM standpoint, it should not be. Every time working code is modified, there is a risk that the modification will break something. There is also a risk that subsequent integrations will drift further from from what was intended, but if the regression tests pass, the drift may be unnoticed for quite some time.

There are some Version Control (VC) tools that allow for automatic patching when there are dependent changes in the codebase. In the example above, assume that there is one other dependent change set (as illustrated below). For tools of this type, the user is given an option to:

  1. Take all of the dependent change set along with the desired one (that may not ever be complete and ready for promotion)
  2. Ignore the dependent change set entirely (requiring additional work in Integration that will present problems when the dependent change set is finally promoted or the changes made in Integration are returned to the original Sprint)
  3. Take a “patch” from the dependent change set that only contains the changes necessary to make the desired change set work.

If option 3 is available, and if the tool can keep track of these patches automatically, then it is the most viable and valuable option.

3


There is yet another option, but it is rarely used – fix the integration problems in the original Sprint by reverting any promotion that does not occur cleanly. The problem with this is that it causes perturbation in the Sprint's codebase due to having to accept changes from Integration that they may not be ready for if more than one issue is being worked at a time.

How Does That Differ from CI Builds?
Continuous Integration Builds are builds that result from changes in a codebase. Typically, they result when a developer checks code into the Sprint level from their workspace. There may be a small window of time before the build is initiated to allow for subsequent related check-ins, but after that the build starts. The primary purpose of these builds is to ensure that the codebase still builds. Running a close second is the desire to ensure that a minimum level of regression testing is performed to in order to ensure that no existing functionality was broken by the new code.

When code is promoted from a Sprint to Integration, a CI build may also be initiated, however it is not uncommon to require a deeper level of testing to be performed. Since several integrations may be underway concurrently, these builds are often manually initiated when it is felt that “it is time.” Continuous Integration is based on the philosophy that it is best to take each small working change and push it upstream as quickly as possible. This applies from both individuals and features/fixes and, when animated, a diagram of this looks like a party of ants returning to their colony. The biggest difference is the reverse flow of changes back downstream as each feature/fix is accepted.

What Can Go Wrong?
When a codebase is implemented pathologically (in other words, cross- and cyclically-linked code), it becomes almost impossible to make changes to a file in one change set without impacting other changes to the same file in other change sets. There is no cure for code of this type other than evolving it into a more structured (and modular) architecture.  One of the challenges of SCM is to determine when a codebase has evolved to the point that it is realistic to switch branch/stream structures and make changes to the way builds are performed. The easiest way to tell that the evolution is working is to trend the amount of patches and rework necessary as change sets are promoted.

Waiting too long to promote can cause problems by allowing a Sprint to drift too far from Integration. This is especially true when there are a lot of Sprints working in parallel or there are multiple Integration levels. Code that works just fine in the Sprint may be almost impossible to make work the same at the Integration level, especially if a previous merge contained refactored code.

In fact, refactoring too often is something else that can cause problems. Here, your VC tools can either save you or make you want to jump out of a window. VC tools that “know” what an element's previous name and/or location was and can still perform a viable merge are invaluable. But even these tools suffer from the case where refactoring moves logic from one element into multiple other elements or vice versa. It can get even worse if the code being moved around is repetitive in nature (think of a series of button definitions for a screen). Note that the problem is not that the tools cannot do a merge, but rather that the merge results in incorrect code that may well compile

The reverse of waiting too long to promote is not pushing accepted changed back downstream often enough. This too allows for codebase drift and makes each subsequent upstream merge more challenging. Here, a non-pathological codebase may hide the problem for quite a while and the only symptom of the problem is trending the amount of time/effort/changes necessary to get each upstream merge working properly. As shown above, “rebasing” after each change set promotion is a good idea. Some feel that this should be done prior to the upstream merge so that the initial integration changes can be made in the Sprint where the most knowledgeable people are already available.

Trying to do too many things at once is yet another problem. If there are a list of changes being worked in a Sprint at the same time, then being able to merge a single change set upstream is made much more difficult. Again, the more pathological the codebase, the more difficult it is, but since we are talking about Sprint-level work the changes are most likely interrelated anyway, regardless of the codebase pathology.

What Can One Do?
First and foremost, try to architect the software so that pathology is minimized and modularity is maximized. This will make all merges easier, regardless of when and how they are performed.

Second, and almost as important, control refactoring! This can be done for any codebase, whether it is well architected or not. Try to not have multiple Sprints doing refactoring at any one time unless the changes are relatively isolated. Massive refactoring should be undertaken as the sole change set in a Sprint and any other Sprints are involved to the extent that all are well aware of what each other are doing so their changes do not step on each other.

This brings up the point that one should foster intra- and inter-team communication. This does not mean to have a lot of boring meetings, but rather that the teams should talk to each other. If the teams are distributed, but share “work time,” then it may be possible to use an IM or IRC channel (note that as an SCM person, I tend to be paranoid and feel that this mechanism should be hosted internally and any external communication be encrypted). Participation will vary as to what is going on and what one is doing at the time. There will be times where most people will be “lurkers” and not “talkers.” This is acceptable since the goal is to keep everyone informed without taking too much time doing it.

Allow enough resources to support integration testing (rapid acceptance/rejection). If the Sprint personnel are responsible for doing the merges to Integration, well and good. If they are also responsible for ensuring that the Integration level is working as it should, then there will be problems due to focused testing and blind faith in regression test suites. It is better to have independent test resources at this level and utilize the Sprint personnel as needed for rework or test validation.

Allocate work in small, independent packets. I know this is a standard part of all Agile methodologies, but they are focusing on the “small” part, not the “independent” part. If there are 10 small tasks, but all of them share code, then they will most likely have to be merged all at once instead of incrementally. Of course, there will always be some overlap, but this is chance to keep things modular where it makes sense.

Code to allow incomplete features to exist in the production codebase. Instead of having to have a full feature added all at once, design and code so that the code lies dormant in the codebase until it is time to be activated. This may entail having to ignore some warning messages about unreachable code, but it does allow for early incremental integration. One of the more common ways of controlling whether the code is executed is by use of environment variables or execution switches. This avoids the warnings once the code is at a level to allow test access and also allows for incremental (functional) testing to be performed prior to taking the code live. When it is time to “activate” the feature, one can either remove the checks to that the code is always present or leave them in and set up a set of “feature activation variables” that control what runs and what does not.

Summary
As always, your mileage may vary, but the intent has been to show that just blindly performing CI (and CI builds) is not sufficient. Someone must know the code to make sure that the “branch and merge” process is appropriate to the codebase being worked. If multiple codebases are being worked independently, then it is entirely possible that the same Agile methodology can be used, but the Branch, Merge and Integration parts may be drastically different.

I would be interested in hearing some success stories (and failure stories) from other SCM types out there. Please post comments if you have time.

About the Author Ben Weatherall is currently based in Fort Worth, Texas where he practices Practical CM on a daily basis supporting a modified Agile-SCRUM development methodology. He uses a combination of AccuRev, CVS, Bugzilla and AnthillPro (as well as custom tools). He is a member of IEEE, ASEE (Association of Software Engineering Excellence – The SEI’s Dallas based SPIN Affiliate), FWLUG (Fort Worth Linux Uscers Group), NTLUG (North Texas Linux Users Group) and PLUG (Phoenix Linux Users Group).

Trackback(0)

Comments (2)add comment

Kevin Dietz said:

Kevin Dietz
...
Hey Ben, Good article.

I've thought about these issues a lot, and decided to develop a product to help address them. Check out http://www.mergemagician.com.

It's an automated merging tool that can be used for both upstream and downstream merging so that you can then tie it to a multi-staged CI system.

It currently supports Subversion and Microsoft Team Foundation Server, and I plan on supporting other systems in the future. It is a commercial tool. Let me know what you think.
 
June 10, 2011
Votes: +0

Brad Appleton said:

Brad Appleton
...
Hi Ben! Can you provide a reference for the definition of CI that you are using. I ask because your definition seems to differ vastly from the ones I most commonly see in the Agile community, particular from Martin Fowler, and Paul Duvall.

In particular, CI doesnt require the use of (separate) streams and the notion of a separate sprint "stream" and integration "stream" wouldn't be a part of the normal definition of CI (much less the notion that they would become out-of-sync, since such separation ould break the implied flow intended by CI).

Are you perhaps referring to a particular adaptation of CI for large multi-team projects commonly known of "pipelined" CI or "(multi-)staged CI"? If so, could you refer to the source you are using for that definition? It is also critical to know the criteria you are using for what if the difference (e.g., in policy) between the integration stream and the sprint stream, and whether they are "per team" and how/if multiple streams feed into a higher-level stream, as well as the differences in levels/scope of automated/regression testing being done for each stream.

Without that, I can't really determine teh context for where/how your adaptation of CI applies to my own experience or that of others I have seen/heard.
 
June 06, 2011
Votes: +0

Write comment

You must be logged in to post a comment. Please register if you do not have an account yet.

busy
 
509 Bandwidth Limit Exceeded

Bandwidth Limit Exceeded

The server is temporarily unable to service your request due to the site owner reaching his/her bandwidth limit. Please try again later.