| 
What to Do With Those Pesky Tags Print

Tags, also known as labels, are a common piece of metadata in Version Control systems. They allow a symbolic linking of many files/revisions together using a human understandable name. They come in many types and flavors and their use is often hotly contested, some times all the way to the political level. First, a little summary of what tags are and what they are often used for.

Fixed Tags:
Fixed tags are tags that are applied once and then "never" moved. By "never," I mean by anyone other than a CM person. They are used to identify a fixed set of file/revisions and generally correspond to a milestone point in time (also called a version). Even a Configuration Manager should refrain from playing with fixed tags as it could easily invalidate an auditable milestone. All changes to fixed tags should either be logged and tracked by the Version Control (VC) tool itself, or by the CM person who made the changes in an official CM Record.

Branch Tags: Branch tags are tags that identify a specific branch. Depending on the VC tool in use, how these tags are defined and used will vary, but in general referencing a branch tag will yield the latest revision along that branch. Some tools allow branch tags to be used in conjunction with a timestamp to select file/revision sets along the branch that are earlier than, later than, earlier than or equal to, or later than or equal to a timestamp.

Floating Tags: Floating tags are tags that move from revision to revision somewhat like branch tags, but they cannot be used to specify a branch. They are generally used to select file/revisions that will be part of some future milestone. In general, they float to the latest revision on the branch they are created on unless some tool-specific restriction gets invoked. Some of the more common restrictions are:
  • A number in the revision numbering sequence is manually forced, thus causing gap in the numbering sequence
  • A specific version is promoted to a different state (e.g., Development to Test)
  • The tag is frozen
Frozen Tags: Frozen tags are tags that are blocked from being changed. The most common use is to change a floating tag into a fixed tag. The second most common use is to lock a tag from being modified by anyone. Of course, CM Administrators can use special tools or modes to allow them to thaw frozen tags and in some cases manipulate frozen tags directly, but good practices and all regulatory agencies require the CM person who made the changes to note the fact of modification in an official CM Record.

Uses for Non-Branch Tags: The most common uses for non-branch tags include:
  • Identifying files/revisions used in a build
  • Identifying the promotion state of a build
  • Identifying the components of a release candidate
  • Identifying the source and destination revisions involved in a merge
  • Identifying the base revisions involved in a branch creation
In our environment I may have up to 8 branches in use at any one time. Some of those branches have hundreds of builds tagged. Each branch will have at least one merge source tag. The trunk will have one branch base tag and one merge destination tag per branch plus trunk build tags (currently around 1100 and counting). As you can see, that's a lot of tags, and we only tag successful builds! Having all of these build tags allows me to determine exactly what changed between builds - down to the character level if necessary. It lets me trend the number of elements added, modified or removed between builds and, in conjunction with a Defect Tracking system lets me assess the overall quality of the changes. This can in turn be used by Management to objectively support release/delay decisions

A particular build can have additional tags applied to indicate its promotion state (such as Functional Test, Systems Test, Acceptance Test or Release, though the actual tag more often resembles FuncTest_1_3). Many of today's VC tools provide this information using metadata other than tags, but it is always possible to use the tags as a form of lowest common denominator.

Using tags to identify a release candidate is accomplished by applying a second tag on top of a build tag. This double-tagging allows for easier filtering, especially if one does a lot of builds. It is also not uncommon for this type of tag to be the only one used for promotion tagging.

Merging is always a problem and keeping track of which revisions were merged together is always a challenge. Some tools such as ClearCase and CM+ make this much easier and provide mechanism other than tags to do it with. Once again, however, tags form the lowest common denominator. If only a diff-2 style merge is used, then the source of a merge is tagged using a tag such as MRGSRC_ACS_20061010 and the resulting revision is tagged using a tag such as MRGTGT_ACS_20061010. If a diff-3 style merge is performed, then the common ancestor revision is also tagged using a tag such as MRGBASE_ACS_20061010, though it is more common for the common ancestor to be the base of a branch (see below). In practice, I also use four tags: Common Ancestor, Merge Source, Pre-merge Target and Post-merge Target. This allows me to also determine what changes were made during the conflict resolution phase of the merge. Once again, the primary motivators to using these tags are traceability, auditability and metrics.

The final type of tag listed above is applied to the branch point when a branch is created. It is generally easy to compute the base point of a branch for any individual element, but doing so for the entire branched code base is not a pretty site. Using the base tag always allows for determining what has changed during the life of the child branch as well as what has changed on the parent branch during the same time. Comparing the two yields at best a list of potential conflicts that may need manual intervention and at worst a ballpark approximation of the difficulty in merging the child branch back into the parent.

Project Management periodically asks if I can prune the tags so there is less "useless" information being presented to developers. Now comes the fun part. I live and breath CM. I know that I will need to reproduce a build from 6 months or a year ago. I know I will need to do data mining for future QA analysis. I know that getting rid of metadata such as tags makes traceability and auditablity much more difficult (and using lists generated from the tag metadata before it is removed gives me even more files I need to version!). I know that I would have to document each tag removed in a CM Record, along with the file/revisions that were affected if I am to do things "The CM Way." Explaining this to Management (or even developers) is like explaining why a pack rat hides things or why test-first is such an important aspect of the Agile development model.

So I do what I can. I set up the GUIs and IDEs to filter out most of the tags. I generate reports that list the information they need without extraneous data that tends to confuse the audience. Some of the VC tools even make this simple; others seem to go out of their way to make sure we always see all of the metadata whether we want to or not. I do whatever I can to keep my tags. And I keep the metadata duplicated in a non-tool specific metadata repository just in case.

I would love to hear the experiences of others in how they have used tags, or could have used tags to make things easier. I would also love to hear the occasional disaster story. All is grist for the mill...


Ben Weatherall is currently based in Fort Worth, Texas where he practices Practical CM on a daily basis using a combination of CVS and custom tools to support a modified Agile-SCRUM development methodology. He is a member of IEEE, ASEE (Association of Software Engineering Excellence – The SEI’s Dallas based SPIN Affiliate), NTLUG (North Texas Linux Users Group), and PLUG (Phoenix Linux Users Group).
Trackback(0)
Comments (0)add comment

Write comment
smaller | bigger

security image
Write the displayed characters


busy
 

Video News