SCM Patterns: Building on Task-Level Commit

[article]
Summary:

“Dad,” asked a young man, “my lady friends keep talking to me about being ‘involved with’ them versus being ‘committed to’ them. What exactly is the difference between involvement and commitment?”

“What did you have for breakfast, son?” his father replied.

“Bacon and eggs like always. Why do you ask?” said the son.

“Bacon and eggs, my boy, is a perfect illustration of the difference between involvement and commitment: the chicken was involved, but the pig was committed!”

The task-level commit (TLC) pattern is used to bring the concept of a change set to the check-in process. Essentially, TLC is used to group a set of related changes to separate objects into a single task level operation. This provides a much more intuitive delivery model for code changes, and reduces various kinds of configuration and dependency errors. If your tool does not provide support for change sets or something similar, you can still implement the TLC pattern using scripts.

Note that there are commercial products that provide this support, although the naming may be different. Aside from task, you may hear the terms change set or change package. There are differences in the various nuances of these terms, but for our purpose the differences are insignificant: either you have this capability built in, or you don’t.

This article assumes you can figure out how to perform basic TLC in your tool and discusses what you can build on top of that foundation: task based development.

Task-Based Development
The best use of TLC is in a fully task-based development cycle. In this model, all changes use the TLC approach, so there is always a task identifier for each change. An attempt to mix TLC with regular file-based development will degenerate into a chaotic muddle due to the inconsistent availability of data.

Tasks are natural elements for change tracking. It is important to distinguish between change set tasks and project management tasks. Generally, your project manager has tasks on her project plan with much less granularity than the coding tasks. CM and project management products are different, and so they use the same word differently. Explain this to developers and managers alike: there are always a few managing tasks that coincide perfectly with coding tasks, so you may have to search for a counter-example.

Task Philosophy
The basic task philosophy is that tasks group together changes to files and directories. Instead of dealing with check in and checkout on a file-by-file basis, TLC supposes that all of the changes are delivered as part of a single larger unit, the task.


This means that if 20 files are being changed, none of them should be checked in until all 20 are checked in. The corollary to this is that you shouldn’t change 20 files together unless they need to change as a group.

Tasks Should Be Fine Grained

Because of the coupling among all the files of the task, TLC is generally implemented using very fine-grained tasks. Tasks are not a surrogate for change requests. They should be comprised of short, simple steps, each of which can be confirmed to work. In this regard, TLC is well suited for use with an agile development cycle. The rapid turnover of agile project development matches the rapid cycle of task based development.

One Keyboard, One Task

One additional note about allocating tasks: each task should be accomplished by a single developer. Specifically, if more than one developer is needed to perform some work, create multiple tasks. When a certain level of confidence is achieved, all the files associated with a task can be promoted. You can see that task based work tends to foster an ‘all or nothing’ approach: once you start managing with tasks, you want to manage everything with tasks.

The final factor to consider when implementing a task based approach to software change is the promotion model. When code is checked in as individual files, there is generally a test run to confirm the function of the individual file. Likewise, when TLC is used all the files that were delivered together should be tested together.

After initial testing (usually in a ‘smoke test’ or ‘nightly integration’ build) the task and its associated changes can be moved through the software cycle in company based on the assessment of the task, not on a particular assessment of files within it – the task is only as good as its worst component.

Implementation
Implementing task level commit will require some kind of element to use as your task tracking object. This can be a mechanically generated label (for the CVS alikes), or it can be a change request written at a low level.

If you are using the label approach, choose a single scripting language that you can use across all your platforms. If your shop uses strictly Windows, or strictly Unix, then you have a plethora of choices. If you need both, however, then Perl is the obvious choice. Regardless of language, develop your scripts with a common interface. In this regard, a GUI won’t hurt, but you’ll probably be better off coding a Web interface: that’s one more set of things you can do from home.

If you have a tool like Borland’s StarTeam that provides built-in change request tracking (CR) and if you can get away with this change, it’s possible to use a CR tool as a basic task manager. This is especially true if the integration with the version manager is good, and if the tool provides a way to simplify the lifecycle of the CR objects.

Lifecycle

The working state, obviously, corresponds to a developer checking out files and following the edit/compile/debug cycle.

·       The finished state is entered when a developer marks the task finished. The files associated with the task should be checked in, and later parts of the development process can now being to work with the change task.

·       The withdrawn state is used when the developer, a tester, or the build manager realize there is a problem with the task that interferes with other work. When other work cannot be accomplished, the troublesome task must be withdrawn. If there is a problem with the task that does not interfere with other work, then there is no reason to withdraw it.

Configuration Automation

Using a task based approach to development requires a simple mechanism for changing your software configuration based on the contents of a task. There Is No Such Thing As One Task

If you try to do task based development, you need to ask, “What if I need to roll back a task?” Generally, software CM tools aren’t designed to do subtraction: they want to add things to a baseline. As a result, a task-based configuration is expressed as: current configuration = baseline + change list

 

Where the baseline is some reproducible fixed point, and the change list is a list of changes (tasks) that can be recomputed based on the addition/removal of elements. This means that rolling back a task consists of removing it from the change list and then recalculating the current configuration.

Each separate configuration will require a separate change list, and may in fact use a separate baseline. Typically, the baseline will be a common one used by the entire development team – synchronizing changes becomes considerably more difficult, otherwise.

So every developer workspace, every nightly build area, every test configuration will have a data set consisting of the baseline identification (or the name of a communal baseline identification link) plus a work area specific change list. In general, the baseline identifier is just a label, while the change list is a list of task identifiers (numbers, or alphanumeric strings).

The task based configurator will have to compute the configuration by determining the changes associated with each task, plus the file versions contained in the baseline. The latest version (use the file modification time if your system is file based, otherwise use the task finished time – the objective here it to provide a quick discriminator) of any given file is included in the configuration. Respect the Baseline

Remember to compare the version associated with the baseline to all the other versions.  When you are rebaselining your configuration by moving the baseline forward, it’s likely that some change lists will include ‘out of date’ tasks that have been superseded by the new baseline. You wouldn’t want to skip today’s latest version because yesterday it was old.

Advanced Configuration
Once you have the basics of task configuration automated, you may want to consider some lifecycle automation. It will help if you have your tasks in a database, or in a storage area that you can treat as a database. For example, Perl’s DBI provides interfaces that can treat flat files and spreadsheet CSV files as simple SQL databases.

There are patterns to the configurations used by developers, build managers, and testers. Your task lifecycle should make it easy to implement these patterns automatically, so that developers by default see “All approved tasks plus all tasks being worked” while testers see “All tasks that have passed the nightly build”.

Your software cycle will determine what the patterns are, but once you recognize them they should stay consistent. Automate these, add the scripts to your build management web site, and the developers have one less thing to break.

Task Based Builds
Builds aren’t any different when using TBD. 
 

What Changes Are: the Reporting and the Results

Specifically, if a build succeeds you can automatically advance the tasks that participated in the build to a higher state in the lifecycle. Don’t do this for developer configurations, as they may not be finished working.

Likewise, the reporting of build results is actually more useful in a task based environment. The tasks provide a level of useful abstraction. It is now meaningful to report all of the delivered changes in a build, since the changes look like: Task 1: change ‘about’ dialog to use color scheme; Task 2: add ‘fast exit’ behavior; Task 3: fix PTR 7310: Save does not work after refresh.

Instead of looking like this: help.c (1.2), display.c (1.22); main.c (1.7), exit.c (1.5), commands.h (1.9); main.c (1.8), save.c (1.11), display.c (1.21)

The first report is just as detailed as the second, as far as the developers are concerned – the developer who resolved task #1 knows what was required. The task-based report is orders of magnitude more useful for testers and project managers, however.

Conflicts
One additional bonus of task based development that is literally impossible to reproduce using file-based development is the detection of file dependencies. Look at the report above: notice that main.c is reported as changing in two different lines. What does that mean?

In this case, it means that the implementation of task #3: Fix PTR 7310, has a dependency on the implementation of task #2. That’s obvious, and any system could catch it. Likewise, task #1 depends on task #3. What you can only catch using the task approach is that task #1 depends on task #2! Why? This is because of the sideways connections from one file to another using the task record. The only link from display.c to main.c is the fact that versions of both are associated with task #3. Without the task linkage, there wouldn’t be any record, other than a purely chronological one, that display.c (1.22) was developed in an environment that already had main.c (1.8).

This kind of second-order data requires the web of task relationships to be maintained. It’s not always a problem if main.c (1.8) isn’t available (the “about” box will probably work fine) but sometimes it is. And the only way to catch it is to record the connections between the files. This can be a challenging report to write, since the number of steps between the ‘starting’ file and the ‘ending’ file in the chain can be arbitrarily long.

Promotion
As you can see from the above section, tasks or change sets aren’t really the independent units we would like them to be. They depend, implicitly or explicitly, on the work that has been done before.

One result of this is that it’s a bad idea to try to separate tasks. When a developer completes a task, it’s completed in the context of all the tasks configured in the developer’s workspace. That means all the tasks making up the current baseline, plus all the tested and approved tasks that may have been included in the developer’s change list.

When a nightly build succeeds, it doesn’t validate that each single task is approved for further individual progress up the task lifecycle. Instead, it confirms that the entire school of tasks (picture the silvery flight of a school of reef fish in a Jacques Cousteau special) is approved for further progress. In short, when you promote a task, promote all the tasks with it.  All of them built together, all of them were tested together, and so all of them should advance together.  There’s no such thing as one task.

Propagation
The final topic to address when doing task based development is propagation of changes across parallel release streams. There is nothing inherently different about propagation of tasks versus propagation of file changes. The differences lie in the fact that using tasks to manage change makes it easier to discuss propagation, easier to understand and control what is being propagated, and easier to understand the impact of propagation conflicts.

In a parallel development scenario, one of the important considerations for task propagation is that the “earlier” or “wrong” tasks should not supersede “later” or “right” tasks solely on the basis of a more recent finish time. That is, a task being developed for release 1.1 SP 2 should not supersede a task being developed for release 2.0.

Similarly, a task that is specific to customer A should not replace a task that is usable by all customers unless the configuration in question is, in fact, specific to customer A. Even then, it should take some consideration.

Obviously, the propagation of tasks from one configuration set to another can be partially automated. But the final reconciliation of conflicting task changes is going to require careful review by the development and testing teams. Conclusion

There are two key points to be taken from this subject. First, task based development provides a higher degree of abstraction than file- or branch- based development. As a result, moving to a successful task based development cycle will result in an increase in productivity for development staff. Second, there is a fair amount of complexity in the TBD approach. This is a good thing: it means that TBD is a new direction for development and research. For our purposes, it also means that commercial vendors who have invested resources in developing TBD products have an edge over their non-TBD competitors. If you are trying to bolt task level commit on top of a non-task-supporting product, you should keep your eyes on the market.  At some point, the investment you make will prove that there is value in adopting a 3rd party tool to provide the same features you have coded by hand. 

CMCrossroads is a TechWell community.

Through conferences, training, consulting, and online resources, TechWell helps you develop and deliver great software every day.