In the October issue of the CM Journal, we
reported on a literature study we had done on SCM-specific metrics.
In this issue, we focus on some case
studies we did at ABB Malmö and Lund Institute of Technology (LIT) on
SCM-metrics. We report on a number of metrics that we found useful and also on
our experience from using some of them.
Introduction
ABB is a multi-national company that
specializes in Power and Automation Technologies. It provides ready solutions
for its customers and over the years a growing part of the value (and also
costs) of its products comes from software rather than hardware. ABB Malmö
wanted to look into the needs for new SCM tools/functionality, improvement
or/and enforcement of existing SCM processes - and increased visibility of SCM
in the organization (don't we all appreciate being recognized for the good work
we do).
At LIT we carry out research and education
in software engineering in general - and in SCM in particular. We teach an
independent course in SCM, but also introduce the topic in a course/project
where eXtreme Programming (XP) is used to teach students aspects of programming
in a team [Hedin et al., 2005]. We are interested in SCM-specific metrics data,
first to convince our students of the usefulness of XP practices like
Continuous Integration and subsequently to show them that certain problems may
be caused by not following certain practices. We obtain and use metrics data
from about ten student XP teams during their six-iteration projects each spring
term.
In general, there can be many reasons for
looking at - and actively using - SCM-specific metrics. We may want to justify
the various costs of SCM to the rest of the organization by showing some sort
of return on investment. More specifically, we may want to obtain finances for
certain SCM activities (tools or processes) and want to be able to hypothesize
value for the money we ask. After a sponsored activity, we may want to show the
sponsor that we have actually provided value for the money. Metrics - whether
SCM-specific or not - are also key in process improvement and for planning and
scheduling of resources. Finally, metrics and their publication will provide
more visibility in an organization for the activities that are measured and
monitored.
A particular characteristic of SCM is that
(parts of) SCM-designed processes are often carried out by other people, and
that one person's gain is often another person's pain. Therefore, numbers can
be very important in convincing people that it is in the common interest of the
group that they "do the extra work".
Analysis of potential
SCM-specific metrics
We started out by using an analytical
approach to finding possible SCM-specific metrics that could be tried out.
First, we tried to come up with types of metrics depending on what would be
their purpose - and then for each category we would do a brainstorm to come up
with some specific examples.
From our analysis we distinguished four
different reasons for carrying out measurements: deviations - to see how well the actual processes comply with the
prescribed; resources - to allow us
to plan and schedule based on data from the past and information about the
current situation; indicators - to
enable us to spot trends before we end up in troubles; and accuracy - to see how well our predictions fit with what happens.
When we had established the categories of
metrics, we started brainstorming specific examples from the SCM domain. These
examples that we give below are very specific for our particular SCM context
and our needs and curiosity. Other people may find some of them to be strange
and others to be useful - and may even find that the lists are incomplete.
However, we believe that the categorizations can be a useful starting point for
others.
Examples of SCM-specific metrics that came
up during our initial brainstorm:
- resources
- time/effort to produce a build/release
- time/effort to perform a Configuration Audit
- time/effort to create various types of workspaces
- time/effort (cost) to create/handle a branch
- time/effort to integrate a bugfix/change
- time/effort (cost) to re-create an old release
- deviations
- # of SCM-created defects (due to SCM processes)
- # of build failures
- # of defects detected by SCM (before test!)
- # of defects/problems caused by circumventing the SCM process
- # of merge conflicts/double maintenance/...
- # of checkins without comment
- # of and cost of wrong CCB decisions
In the following, we describe our
experience with a number of SCM-specific metrics that we decided would be
interesting for us to dig deeper into.
Checkin comments
The use of checkin comments is a way to
provide more information about the history of a component then just the trail
of versions. Furthermore, proper use of checkin comments can provide
traceability to what change request was implemented in a specific version and
to the other components and versions that were part of the same logical change.
For this reason our SCM processes regarding checkin prescribe the use of proper
checkin comments.
At LIT, we had not expected the students to
always write proper checkin comments, but we were surprised to find that they
almost never wrote checkin comments. We had explained the importance and use of
checkin comments, but from interviews we discovered that they found it
difficult to know what to write (missing "templates") and that they actually
never felt a need for reading comments (could contact the author of the change
as were in the same room at the same time). Now we provide students with some
"templates" for comments and explain that though comments are important in
general, they may not be extremely important for their short-lived XP projects.
At ABB Malmö, the SCM process explicitly
says that every checkin must have a comment. However, still a significant
number of checkins did not have a comment. There were several reasons for not
supplying the comments, however, they fell in two main categories: lack of attention
to the process (either because they did not care, were sloppy or did not feel
they had the time), and checking in "private versions" on a branch in which
case they did not make a comment until the branch was finally merged back into
the main line. Developers were reminded of "what the process says" and the
number of checkins with comment increased to a satisfactory level - now this
metrics is measured only occasionally to check up on the state.
In neither case did we measure the
"quality" of the checkin comments.
Build-related metrics
The primary build-related metric is
"time/effort to produce a build". Especially when we use
Test-Driven-Development, it is important that builds are fast, so we do not
slow down the "write test, run tests, write code, run tests" cycle. For our
student projects there is so little code that build time is not an issue - but
"real" XP projects might want to monitor this metric.
At ABB Malmö they use nightly builds. Quite
often the nightly build would break and had to be fixed to get a working build.
In the majority of cases the build would not actually be broken (just warnings)
or it would be very easy and quick to fix the problem. However, in some cases
the problem would be more serious and it could take quite some time to get the
build working again. We were looking for some early indicators that could warn
us that the build was about to break beyond a quick and easy repair. We
measured warnings and failures to see if there was a trend or threshold value
prior to serious build problems.
We did not find any direct correlation
between warnings/failures and the difficulty in fixing a broken build. Most
probably because what we measured (warning/failure) was the result of a
compound effect or because there were many different causes of broken builds.
We would have liked to get further information from the developers fixing a
broken build about what had actually been the cause - but we could not convince
them that they should fill in more forms.
On the student XP projects we use JUnit and
a broken build is defined as a "red" repository. Some coaches would check their
team's repository regularly (either manually or by using CruiseControl), others
did not. All teams experience broken builds at some time. There seems to be indications
to the effect that teams that tend to ignore (be immune to) a "red" repository
are more prone to fatal and longer lasting broken builds than teams that fix
the build problem immediately. We have chosen not to implement a strict and
continuous control of team repositories as we believe that sometimes students
learn more from living through their mistakes.
"Time to produce a release" is another
build-related metric. In the course of six iterations (of one and a half days
of work) our student XP teams produce four releases, so there is a big
incentive for them to keep down the time and effort costs of producing
releases. Most teams create an automated release process that checks out a
release candidate to a clean workspace, compiles the code, does functional and
physical audits and zips everything into one file ready to ship to the
customer. Most often teams will have a competition of who can produce a release
in shortest time - the record to break is 32 seconds. On one hand such a
competition is good as the students can see a clear return on investment from
doing the automation over a manual release process that sometimes can take
hours. On the other hand it is dangerous as it removes the students' focus from
the time-consuming part of producing a release - bring the repository in a
releasable state. However, when they have an automated release process it is
less critical that they need a couple of tries to get the code releasable.
Merge conflicts
On our student XP projects many people are
working in parallel on little code (especially in the beginning), so merge
conflicts are inevitable. However, not all teams are equally prone to getting
merge conflicts, so we wanted to use the metrics data for a cause analysis -
especially regarding complex merge conflicts. Unfortunately, we could not
automatically distinguish between trivial and complex merge conflicts, and
students were not consistent in manually providing us with information.
Our hypothesis was that infrequent updates
would create more merge conflicts. There were weak indications to support the
hypothesis, but our data material was not conclusive as there were big
variations between teams.
From informal interviews it appeared that
merge conflicts tended to be trivial. In the beginning there were conflicts due
to many people working on little code, later conflicts seemingly were caused by
too long tasks, and a peak around iteration 3/4 was due to many team being
forced to refactor their basic data structure, which most of them did the
big-bang way (something that team05 never recovered from).
Equally as important as doing frequent
updates is probably to commit often (ie. small, quick tasks). Data indicate
that teams with a higher number of commits during an iteration also had a lower
number of merge conflicts. But again the data material is not strong.
CCB-related metrics
On our student XP projects, Change Impact
Analysis and Estimation was pretty accurate - on average. Which meant that on
average the student teams delivered pretty much what they had promised for an
iteration. However, the single estimates were often wildly off - so beware of
averages when you do metrics.
At ABB Malmö, we had the possibility to dig
deeper into various metrics related to the CCB. Inspired by the change requests
graphs [Jönsson, 2004], we extended that to cover all phases of the handling of
a change request and focus on how much was queuing up and the throughput in
each phase. This gave us a good overview of how things were going and it was
possible to quickly identify bottlenecks. It was easy to compare with the
previous week's numbers if needed, so we did not want to clutter up the graph
with these numbers too. The raw data was already present in our system and it
did not require much work to pull it out and process and format it in an Excel
spreadsheet.
Tool related metrics
There was a tool related metric that we
stumbled upon almost by chance - "the space cost of branching a project". In
our case, it turned out that branching a project doubled the size of the database.
This was an effective showstopper for ideas of implementing some branching
patterns [Appleton et al., 1998]. On the student projects, physical size was
never the limiting factor, instead it was problems with understanding what
branching was all about.
When we were at it, we came up with a
number of other metrics related to SCM tools:
- time to checkout a project (establish a workspace)
- time to tag a release/baseline
- time to create a (n optimal) build
- time to set up a workspace
- time to checkin a change (workspace)
However, we have not tried out any of them
yet. Either because the metric did not answer a (pressing) question or because
it did not trigger our curiosity.
Summary
These are some of the metrics we have found
interesting in our SCM work and have experience from using. What metrics will
actually be useful for other people will depend very much on the specific
processes they have in their particular context (and on the needs - don't
measure what is not used).
Something that we did not measure - and
that is difficult to measure - is the frequency of use of the data. In general,
if the data is not used, it should not be collected. If we have it in a
database, we can monitor the number of queries using it.
Marc Girod has suggested that a better and
more open way to communicate our findings and experience would be through the
CM wiki to allow other people to easily extend or comment on our work. In
principle we agree, but we find it difficult to express the material in a
concise way - maybe we need some kind of pattern language to do that, so we can
talk about SCM metrics patterns and not just SCM branching patterns [Appleton
et al., 1998] ;-)
From some of our examples it might seem
that there is very little interest in supporting SCM-specific metrics and in
supplying additional information for such metrics. In some cases that may be
true, but we also had much positive experience.
Finally, we found that working with metrics
is actually an iterative effort. In retrospect, we should probably have used a
more agile approach (just doing a few canonical metrics to start with) instead
of starting with an analytical approach. The analytical approach did help us,
but we also discovered that when we actually began implementing some metrics,
ideas for new metrics started pouring almost immediately. It is an experience
that we can highly recommend.
References
[Appleton et al., 1998]: Brad Appleton,
Stephen P. Berczuk, Ralph Cabrera, Robert Orenstein: Streamed Lines - Branching Patterns for Parallel Software Development,
in Proceedings of the 1998 Pattern Languages of Programs Conference,
Monticello, Illinois, August 11-14, 1998.
[Hedin et al., 2005]: Görel Hedin, Lars
Bendix, Boris Magnusson: Teaching Extreme
Programming to Large Groups of Students, Journal of Systems and Software,
Volume 74, Issue 2, January 2005.
[Jönsson, 2004]: Henrik Jönsson: Graphs for
Change Requests, CM Journal, December 2004.
Lars
Bendix, Lund Institute of Technology,
Sweden.
Dag
Ehnbom and Ulf Steen, ABB, Malmö,
Sweden.
Trackback(0)
Comments 
Write comment
 |