|
| I find it rare to see
a project with too many metrics. Quite the opposite is usually the
case. The problem is that establishing metrics is a lot more work than
identifying what you want to measure. It's hard to collect the data.
It's hard to mine the data, and then to present the comparisons and
trends which are the metrics is no simple feat. However, as CM tools
move more and more into the realm of ALM and beyond, we'll see some
changes. As integration across the lifecycle becomes more seamless,
the focus will move to dashboards that not only provide standard
metrics, but which are easily customized to mine and present the data
necessary.
The goal is to have a rich collection of data and an easy way of mining the data to establish the metrics for those measures deemed important to process, team and product improvement. When you measure something and publish the measurement regularly, improvement happens. This is because a focus is brought on the public results. However the interpretation of the measure must be clear. For example, are glaciers falling into the ocean a sign of global warming (weakening and falling) or global cooling (expanding ice pushing the edges into the ocean)? A wrong interpretation can be very costly. So a good rule is to understand how to interpret a measure before publishing it. Looking at the software world, is the number of lines of code generated per month a reasonable measure? I want my team to create a given set of functionality using the minimal number of lines of code. So larger numbers are not necessarily a good thing and may be a bad thing. Lines of code generated is not a good measure for productivity. However, averaged over a project, and compared from project to project (using similar programming technology), it is a fair measure of project size and is also useful for overall "bugs per KLOC" metrics. The point here is that you must select metrics that can clearly be interpreted. And when you publish these regularly, the team will tend toward improving the metric because they understand the interpretation and how it ties back to what they're doing. Creating Effective Metrics - What's Needed I don't intend to go into detail about which metrics I find useful and which I don't. If you wish, you can view my article from a couple of years back http://www.cmcrossroads.com/content/view/6717/259/ . Instead I want to focus on how to build an infrastructure so that you can easily generate metrics and test out their effect on your project. The idea is that if it is easy to create metrics, they will be used to improve your process, team and product. So what do we need? 1. A Single Next Generation Repository Across the Life Cycle I'm sure there are a lot of you that can balance multiple repositories and do magic to get data from one to another, knitting multiple data sources together into a fountain of data. But no matter how you cut it, there's no replacement for a single repository that can hold the data for your entire application life cycle. I don't really want to mix transient data, such as compile results, with persistent data, such as management information. But I want all of the persistent information in one place, easily accessible. I want the schema for this data under control of a small team or a single person who understands that the entire team needs all of the data, and who has a thorough understanding of the ALM processes. I don't want to jump through security hoops as I hop from one database to another. I don't want to put up with server problems and network issues because my data is distributed through multiple repositories. I don't want to learn different access methods for each application. I want a single uniform way to access my data. Maybe it's relational. Even better is a next generation hybrid repository which provides both relational and data networking so that I don't have to map from the real world model to the relational and then back again to retrieve the data. And, this being a CM world, if it support data revisions (not just source code revisions), all the better, with perhaps some advanced large object handing for things such as source code, problem descriptions and requirements. 2. Data mining capability OK. Now that I have all that data, all in one place, how do I get at it. I may have tens of thousands of files, hundreds of thousands of file revisions, thousands of problem reports, work breakdown structures, requirements trees, test cases, documents, etc. Each has it's large object components and each has its data component (I don't like using the term meta-data here - that implies that the data is used to describe the large objects rather than the large objects having the same rank as the rest of the data for an object.) So how to make sense of it all. You need a data mining capability - a powerful data query language and tools to explore and navigate, to help give hints on how to mine the data. My job will be simpler if the data schema maps onto my real world of CM/ALM data and my query language lets me express things in real world terms: staff of <team member>, problems addressed by changes of <build record>, testcases of <requirement>. Even better, it will let me aggregate these things: testcases of <requirements tree members>, files affected by <changes>, etc. Then I don't have to iterate over a set of items. A good data mining capability requires:
The query language must let me work with data sets easily, including the ability to do boolean set algebra and to transform data from one domain to another along the traceability (or other reference) links (e.g. from "changes" to "file revisions"). In the CM world, I need to be able to do some special operations - that is operations that you may not normally find in a database. These include (1) taking a file (or file set) and identifying an ordered list of the revisions in the history of a particular revision, (2) taking a set of changes and identifying the list of affected files, (3) taking the root of a WBS or of a source code tree and expanding it to get all of the members, etc. The query language must also be sensitive to my context view. A change to a header file might affect a file in one context, but not in a different context. A next generation CM-based query language must be able to work with context views. It must both work within a consistent view, and between views, such as when I want to identify the problems fixed between the customer's existing release and the one that we're about to ship to them. And the schema must map closely to real world objects. I'd like to say: give me the members of this directory, and not: go through all objects and find those objects whose parent is this directory. I want to ask for the files of a change, not all files whose change number matches a particular change identifier. I want to be able to ask for the files modified by the changes implementing this feature - not the more complex relational equivalent. And I would prefer not to have to instruct the repository how to maintain inverted index lists so that the queries go rapidly. A data summary capability should allow me to easily select and summarize the data. For example, graph problems for the current product and stream showing priority by status, and let me zoom into the most interesting areas at a click of the appropriate bar of the graph. 3. Easily customized metrics with dashboard presentation Given a wealth of data and a means of mining that data, the next step would be to provide an easy way to express and to customize the metrics I'm interested in tracking. This generally requires some calculation and presentation specification language that can use the data query language to mine the data and compute the desired metrics in a manner that can be presented for comparison over specific points in time or along a continuum. Depending on my role in the project, I may want a specifically designed dashboard. If my goal is to monitor product quality perhaps I want to look at metrics such as:
Metrics are also useful for evaluating teams. This is a scary one. Nobody wants their weaknesses pointed out. "Sure we have twice as many bugs, but we do three times as much work." How do you establish fair team metrics? This is where interpretation is tricky. Do you want more changes from a team member or fewer - what does it indicate? More functionality completed? Additional rework frequently happening? Incremental feature development versus big bang? A lot of the interpretation will depend on your process guidelines and procedures. But a lot will simply reflect the different work habits of your team members. So perhaps you want to overload metrics here and over time prune out the ones that aren't really useful or yield ambiguous interpretations. When your metrics can be easily specified and collected into a dashboard or two, you'll start asking better questions and hopefully will improve your decision making. Maybe it will be as simple as noticing that 20% of the staff disappear at March break so that better be accounted for in the planning phase. 4. CM processes and applications that can reliably collect data Remember that one of the key problems in establishing metrics is interpretation. For good interpretation, you need to collect meaningful and reliable data, and you need to have a yardstick against which the metrics can be interpreted. This is where the CM tools and processes must come to the rescue. A version control system that tracks all the file revisions and marks each one with the user, timestamp and description will not give me a lot of metrics, and the ones it does give me will not only be difficult to mine, but will be difficult to interpret. The CM process will demand traceability - from the requirements, through the tasks, to the software changes, to the builds and onto verification. Your CM tools must be able to handle these processes and must be able to collect data in a reliable manner. If your CM tools/processes dictate that you should not check in your source code until the build manager requests your changes, then it's no use trying to use the CM tool to measure how long it takes to complete a feature implementation. The completed code could be sitting on somebody's personal disk for weeks. If your CM tool is a file-based tool as opposed to a change-based tool, don't expect it to give you any metrics with respect to logical changes. If your data is stored within the version control file, don't expect to be able get at it easily. If your processes don't require referencing an approved feature or accepted problem report as part of starting a new software change, expect a lot of variation on the traceability metrics. If it's not easy to search for potentially duplicate problems in your repository, expect a lot of duplicate problem reports, which could further skew your data. It's one thing to say process first then find the right tool, but perhaps its better to say give me a tool that will both do the job and help me to define and continually refine my process so that the process decisions are enforced as they are made. Better yet, give me that same tool, but with a strong process supported out of the box. And for certain, customization must be easy to do so that the experts who want the metrics don't have to run to a technical resource to have them supported. Metrics - What's Useful, What's Overhead So when I'm developing metrics for a project, what's useful and what's overhead? The answer depends on numerous variables which are constantly changing. Define the metrics you might find useful. Create a large list if you want. Then turn the presentation of them on and off as required. Group them into dashboards according to role. Add to the list as you discover additional problems or trends. Make sure that the data you're using is reliable. Make sure that the processes support gathering good data. Make sure that the interpretation of the data is unambiguous. Make the important metrics widely visible, especially if the team members performance can affect the results. Even consider competitive metrics that pit one part of the team or one project against another. Good process, good tools, good presentation. And meaningful metrics. These will make your metrics useful. Joe Farah is the President and CEO of Neuma Technology and is a regular contributor to the CM Journal. Prior to co-founding Neuma in 1990 and directing the development of CM+, Joe was Director of Software Architecture and Technology at Mitel, and in the 1970s a Development Manager at Nortel (Bell-Northern Research) where he developed the Program Library System (PLS) still heavily in use by Nortel's largest projects. A software developer since the late 1960s, Joe holds a B.A.Sc. degree in Engineering Science from the University of Toronto. You can contact Joe at farah@neuma.com .
Set as favorite
Bookmark
Email this
Hits: 10598 Trackback(0)Comments (2)
|
|
... There are a number of factors here. First of all, it's difficult for a company/project to determine its effectiveness on it's own. You may have what you consider to be an ineffective team that is sometimes over budget and sometimes late with development, only to find by comparing to other companies/projects that "sometimes" is relatively good (still there's room for improvement). So to determine effectiveness, you have to identify metrics that a significant portion of the industry buy into, and then you need an appropriate amount of time to establish your project's measures. I like to look at things on a release by release basis. After you have done 2 or 3 releases, your metrics can be somewhat verified internally before being compared externally - but it really depends on the metric. Response time to a customer does not need a number of releases, but a statistically valid number of incidents occuring both through peak times and less busy times. Trend analysis should be conducted at the end of cycles. Now there are many cycles within cycles within cycles in a development shop. The end of a release cycle is typically a good time. But you may wish to look at different metrics at different times (e.g. quality at the end of verification cycles, complexity at the end of implementation cycles, etc.) A more critical factor is how often does your process change. If you typically change process between releases, the release cycle works well. But if you're doing continuous process change, you may want to have some level of continuous analysis going on. One way to do this is to have a large chart starting at the beginning of the project and going forward by weeks along the bottom axis, and perhaps by months once the project reaches a few years in age. At the bottom of the chart indicate key events and milestones in the project, including Requirements/Specification freezes, development freezes, deliveries to verification for testing, end of verification cycles, etc. Also identify significant process changes. Then plot your time-based metrics across the graph at various levels of the chart. This will provide a good map for identifying trends and relating them to the various events. And it will also allow you to see the effects of your process improvements. Hope this is helpful, Joe |
|
IGJ
said:
|
... A pragmatic and well presented viewpoint Joe. Thank you. Could you enlighten me on your take on the longevity of metrics? e.g. to enable a project/company to determine its status and effectiveness, how long do you think metrics should be in place for - and at what frequency do you think trend analysis should be conducted? Monthly? 6-monthly? |
|
Write comment
You must be logged in to post a comment. Please register if you do not have an account yet.



