Trustworthy Transparency over Tiresome Traceability
Many in the agile community, especially the eXtreme Programming community, see red whenever they encounter that maddening “T-word”: traceability. Almost instantaneous distrust sets in against those who would dare to utter it, much less recommend it. On the other side of the fence, we have many agile skeptics who misunderstand (or have all too often seen misapplied) the Agile Manifesto's tenet of "working software over comprehensive document" to mean "We're Agile! We don't do documentation!"
The position of this month's article on traceability is more "lean" than "agile." We base this on the XP & Scrum centric views that were expressed in the March, 2004 YahooGroup discussion thread Why Traceability? Can it be Agile? The "tests over traceability" discussion is probably a valid summary of the XP/Scrum perspective from that thread.
From his recent stint at Microsoft, David Anderson would probably say something more along the lines of "transparency over traceability", where we acknowledge the important goals that traceability is trying to fulfill, but don't necessarily accept many of the traditional ways of trying to attain it. David, in particular, has written about "trustworthy transparency" and "naked projects". These are projects that are so transparent and visible in their status/accounting that they seem "naked". Here is an excerpt from David on the subject of Changing the Software Engineering Culture with Trustworthy Transparency:
"Modern software tooling innovation allows the tracking of work performed by engineers and transparent reporting of that work in various formats suitable for everything from day-to-day management and team organization to monthly and quarterly senior executive reporting. Modern work item tracking is coupled to version control systems and aware of analysis, design, coding and testing transitions. This makes it not only transparent but trustworthy. Not only can a tool tell you the health of a project based on the state of completion of every work item, but this information is reliable and trustworthy because it is tightly coupled to the system of software engineering and the artifacts produced by it. The age of trustworthy transparency in software engineering is upon us. Trustworthy transparency changes the culture in an organization and enables change that unleashes significant gains in productivity and initial quality. However, transparency and managing based on objective study of reality strains existing software engineering culture as all the old rules, obfuscation, economies of truth, wishful thinking and subjective decision making must be cast aside. What can you expect, how will you cope and how can you harness the power of trustworthy transparency in your organization?"
In the past we have written about The Trouble with Tracing: Traceability Dissected where we described:
- The ten "commandments" of traceability
- Nine common complaints about traceability
- Eight reasons for traceability
- The seven functions of SCM
- The six "facets" of traceability
- The five orders of traceability
- The four "rings" of enterprise/stakeholder visibility
- The three driving forces for traceability
- The two overarching objectives of traceability
- One ultimate mission/vision: fostering trust
To make a long (but hopefully interesting) story short, we concluded that the overreaching objectives of traceability are:
- Transparency: the ability to readily view all the information we are concerned with
- Identification: the ability to identify our concerns so we can separate independent sets of concerns and cohesively associate the related ones.
These are supposed to be ideals that help to engender trust. Can we achieve transparency and identification (and hence "navigable knowledge-spaces") without more traditional traceability methods? If so, what are the different ways of doing it?
Our CM backgrounds strongly differ from the many vocal opinions expressed in XP community when it comes to the use of tools for tracking requests/changes. We are strongly in favor of using a "good" tracking tool. Index cards are a great and valuable "tool" for eliciting dialogue and interaction with the "customer" and some of us even use them for this purpose, along with post-it notes. From a CM-perspective, though, index cards alone simply do not suffice as a serious means of storing, tracking, sorting, searching, slicing & dicing development change requests. We do believe an extent of traceability is necessary, and that it's not necessarily "agile," but that it can be, and should be, "lean" and streamlined. It should also serve the purpose of transparency, visibility and status-accounting rather than being a goal in itself.
There are viable alternatives to achieving transparency and identification other than what many regard as "formal traceability." Some of these include:
- Direct customer communication, and better ways to make project information more transparent to the customer
- Ways of applying principles of simplicity and minimizing intermediate artifacts
- Adapting XP's 4 rules of "simple code" to other kinds of information as well, as long as those rules can be translated and interpreted correctly for such non-code artifacts
- Applying the principle of locality of reference and the DRY principle in an attempt to single source information
Many of these concepts and more are embodied in Sam Guckenheimer's recent book Software Engineering with Microsoft Visual Studio Team System. We found this book to be surprisingly good (outstanding even) and not at all what one might expect given the apparent tool/vendor-specific nature suggested by the title. The value-up paradigm, and most of the other concepts and values in the book, are very well aligned with agility while still meeting the needs of more rigorous ceremony in their software and systems engineering efforts. Guckenheimer describes the basic differences as follows:
Attitudinal Differences Between Work-Down and Value-Up Paradigms
Planning and change process
Planning and design are the most important activities to get right. You need to do these initially, establish accountability to plan, monitor against the plan, and carefully prevent change from creeping in.
Change happens; embrace it. Planning and design will continue through the project. Therefore, you should invest in just enough planning and design to understand risk and to manage the next small increment.
Task completion. Because we know the steps to achieve the end goal, we can measure every intermediate deliverable and compute earned value running as the percentage of hours planned to be spent by now versus the hours planned to be spent to completion.
Only deliverables that the customer values count (working software, completed documentation, etc.). You need to measure the flow of the work streams by managing queues that deliver customer value and treat all interim measures skeptically.
Definition of quality
Conformance to specification. That's why you need to get the specs right at the beginning.
Value to the customer. This perception can (and probably will) change. The customer might not be able to articulate how to deliver the value until working software is initially delivered. Therefore, keep options open, optimize for continual delivery, and don't specify too much too soon.
Acceptance of variance
Tasks can be identified and estimated in a deterministic way. You don't need to pay attention to variance.
Variance is part of all process flows, natural and man-made. To achieve predictability, you need to understand and reduce the variance.
Intermediate work products
Documents, models, and other intermediate artifacts are necessary to decompose the design and plan tasks, and they provide the necessary way to measure intermediate progress.
Intermediate documentation should minimize the uncertainty and variation in order to improve flow. Beyond that, they are unnecessary.
The constraints of time, resource, functionality, and quality determine what you can achieve. If you adjust one, you need to adjust the others. Control change carefully to make sure that there are no unmanaged changes to the plan.
The constraints may or may not be related to time, resource, functionality, or quality. Instead, identify the primary bottleneck in the flow of value, work it until it is no longer the primary one, and then attack the next one. Keep reducing variance to ensure smoother flow.
Approach to trust
People need to be monitored and compared to standards. Management should use incentives to reward individuals for their performance relative to the plan.
Pride of workmanship and teamwork are more effective motivators than individual incentives. Trustworthy transparency, where all team members can see the overall team's performance data, works better than management directives.
There are several strategies and tactics that can be employed to achieve "lean" traceability in service to trustworthy transparency and friction-free metrics.
A "lean" approach of traceability would focus on the following:
1. If one uses single piece flow and makes changes at the granularity that TDD mandates, then software-level requirements, design, coding, and testing are all part of the same task. Tracking them to a single record-id in the change-tracking system and version-control tool would actually go a long way toward traceability. It’s much more work and creates many more intermediate artifacts when these activities are all separated over time (different lifecycle phases), space (different artifacts) and people (different roles/organizations). When traceability efforts noticeably interfere with "flow" is when agilists object. It’s important to minimize intermediate artifacts and other perceived forms of "waste" (over specifying requirements or too much requirements "up front") because fewer artifacts means fewer things to trace.
2. Collocating both people and artifacts, with the former for communication and the latter for locality of reference documentation, as those artifacts that are deemed necessary. This also entails applying single source information whenever possible; a project-wiki is one common, proven way of doing this.
3. Coarse-Granularity and Modularity/Factoring of what is traced. Tracing at the highest practical level of granularity. For example, is it practical to trace to the individual requirement or the use-case? To the line of code, or to the method/subroutine, or to the class/module is about "simple design" and "(re)factoring" as it applies to the structure of the traced entities and their relationships.
4. Transparent, frictionless automation of the terribly taxing and tiresome tedium of traceability. Focus on taking the tedium out of manual traceability and have it streamlined and automated as much as possible. Ideally this would happen seamlessly behind the scenes, like with Jane Huang's event-based traceability (EBT) or thru the use of a common environment "event" catcher within Eclipse or MS Team System server. This would most likely be accomplished by using a task-based, test-driven (TDD) or feature-driven (FDD) approach.
So what are some of these specific ways of attaining lean traceability in accordance with Agile values, lean principles, the value-up paradigm, and of course the basic tenets of sound CM?
Recognize That Traceability is Not Tracing
One of the first ways is to understand and recognize the difference between traceability and tracing. Traceability is the problem that needs to be solved. Tracing is one way of solving it. Unfortunately, many of the known ways of tracing are actually what cause the familiar moans and groans when you utter the dreaded "T-word" to many developers. If we have trustworthy transparency, then traceability requires a systematic means of utilizing that transparency so that we can quickly connect the dots between any two points in the transparent information that has been made available. Trustworthy transparency is great in that it makes all the information available to us. We still have to figure out how to connect those dots, though. That is where traceability fits in: by providing us the ability to identify those paths between the "dots" that we deem to be important.
Use Version-Control and Change-Tracking Tools
What are some of those dots and some "lean ways" of connecting them? First and foremost is the fundamental practice of version control and having it integrated with basic change tracking. The version control system, its repositories, and the change-tracking system (and its repository) are the cornerstone to providing transparency into our process and its lifecycle. They capture and record the residue of our activities and outputs from the moment a request shows up on our radar. This is done through its elaboration and transformation into various lifecycle artifacts and, ultimately, into the code and tests that are integrated and built, tested, released, and promoted/delivered.
Basic Integration between Version-Control and Change-Tracking
One of the most basic ways to help connect and navigate this information is with a task-based approach that links every action and event in the version-control system with a corresponding action and event in the tracking system. This can be as simple as having the identifier of a task in the tracking system associated with every transaction in the version-control system.
As a recent real world example of good industry practice, Robert has been working with Camelot, the UK Lottery operator, on integrating Perforce's SCM tool and Serena's TeamTrack defect/issue tracker. Camelot has stringent audit requirements on their code and change control and, ultimately, is responsible to the UK's National Lottery Commission for the integrity of their systems. Perforce replaced Serena's Version Manager for SCM. The integration ensures that all developer check-ins using Perforce are associated with an approved TeamTrack issue and also with an audit trail of changes to those associations. This combines the benefits of TeamTrack's workflow capabilities with the solid SCM capabilities of Perforce.
Such integrations are successful when you:
- Minimize the replication of information between two systems (following the DRY principle) - so only those fields are replicated which are necessary
- Seek to avoid concurrent updates to particular fields - i.e. make one system the master for any replicated field
- Provide quick and simple ways (URLs work very well) to allow a user to switch from one system to the other viewing the corresponding issue or changelist(s) - manual copy and paste of an identifier, or requiring the user to search in the other system really increases resistance and slows things down
Task-Based Development (TBD)
Use of Principles of Agile Version Control: From OOD to TBD builds upon this foundation by helping to organize the structure of work in the version-control system in a manner that is easily aligned to the change tracking system. Tasks are the common thread linking work-requests to work-outputs. And if all the work-products created across the development lifecycle (requirements, models, code, tests, etc.) are in the version control repositories and are connected in this manner, then the capability to trace activities across lifecycle activities and artifact has been provided.
Test-Driven Development (TDD)
Use of Test-Driven-Development (TDD) in combination with task-based development can provide even stronger traceability support. This is because TDD basically slices the work up into very small, full lifecycle activities where each task touches only the artifacts needed to fully implement a requirement at a time. Contrast this with a strict waterfall approach where the activities that document requirements are separate from the design and code activities. In TDD, the same activity (and its ID) can easily link together all lifecycle artifacts from requirements to design, code and tests.
Of course, this only works to the extent where you can start TDD. In many larger projects, a significant amount of previous requirements specification and decomposition may still be necessary before TDD can commence. Even in these cases, the requirements decomposition activity creates larger-grained clusters which are then easily traced across the lifecycle via TDD and TBD. We should also mention Behavior-Driven Development (BDD). BDD is an evolution of TDD that also came from the Agile community in an attempt to address some of the weaknesses of TDD. It, too, can be a very effective means of providing this same form of traceability.
Still, there is lots of work to be done in querying and collating this information to produce any required reports, and once must be careful to strike a balance between constraining processes for the ease of tracing versus not being able to trace them. This is where databases and tracking tools give us the automation benefits that index cards alone simply cannot provide.
Simple Tools: Wikis, and Wiki-based Specification Frameworks
It is often easy to go overboard with hi-tech tools and their capabilities. Many tools today provide awesomely powerful environments that are specific to artifacts in just one portion of the overall lifecycle: requirements management, test-management, code management, model-driven design and code generation. They also add a lot of painful complexity when each of those tools uses its own archiving formats and versioning mechanisms. It may be easy to trace between related artifacts in the same repository and lifecycle. Tracing across them, though, can quickly become an even bigger headache than a very primitive means of simply creating all those artifacts in simpler, text-based formats in the same version control tool.
For this reason, a simple but powerful, text-based wiki-web with versioning capabilities can be a very capable traceability tool linking code artifacts with other text-based artifacts that can be readily created in a hierarchical fashion with simple hyperlinks to indicate traces to other related information. Building even more upon this are acceptance-testing frameworks like FIT and FitNesse.
FIT and FitNesse are attempts to use executable tests as the mechanism for specifying software requirements. Many an XP developer would like to say that they can use an executable test in place of a documented requirement. That's not strictly correct if the code is the only form of documentation of that requirement. However, if the requirement is stated in a way that is both simple and SMART (specific, measurable, attainable, relevant/realizable, time-bound) then it can be trivial to associate the requirements statement with the test-code.
FIT and FitNesse essentially provide a simplistic form of "domain-specific language" that can automatically generate, or help generate, automated executable acceptance tests for limited forms of requirements state. For a more state-of-the-art view on the lengths this can be taken to, look no further than Charles Simonyi's Intentional Software.
Event-Based Traceability (EBT)
Event-Based Traceability (EBT) is a means of automatically creating traces based on tight integration between versioning and tracking tools. It is based on work-context that can be deduced from what we are doing at the time in the development environment or IDE. Actions and events in the IDE can trigger actions and events in these other tools and, together with work-context information about the information (e.g., code) that was being edited and its structure (and, for example, a task-ID), these events and actions can be logged in some structural format (XML as one example). Queries and filters can be created to produce traceability reports and selectively decide which events and requirements to trace and to what granularity.
MS Visual-Studio TeamSystem has an event server and logging capabilities built in that can directly support this form of traceability. Events from the versioning tools, code IDEs, word-processors and spreadsheets, build tools, test tools, and more can all communicate with the event server and processes that "listen" for certain kinds of events can take corresponding actions. The Eclipse Application Lifecycle Framework (ALF) Project takes a similar event-based approach to integrating different tools in the lifecycle.
TDD/BDD + TBD + IDE = EBT 4 Free?
Looking more closely at the inter-relationships between Test-Driven Development (TDD), Task-Based Development (TBD), a spiffy interactive development environment (IDE) such as Eclipse: recall that if one is following TDD, or its recent offshoot Behavior-Driven Development (BDD), then one starts to develop a feature by taking the smallest possible requirement/behavior that can be tested and writing a test for it. The code is then made to pass the test. After, it is then refactored and the next testable behavior is developed, etc., until the feature is done.
- With TDD/BDD, a single engineering task takes a single requirement through the entire lifecycle: specification (writing the test for the behavior), implementation (coding the behavior), verification (passing the test for the behavior), and design.
- Traceability for a single requirement through to specs, design, code, and test is much harder to establish and maintain if those things are all splintered and fragmented across many disjointed tasks and engineers over many weeks or months.
- If the same engineering task is focused on taking just that one single requirement through its full lifecycle, and if I am doing task-based development in my version control tool, then...
- The change-set that I commit to the repository at the end of my change-task represents all of that work across the entire lifecycle of the realization of just that one requirement. The ID of that one task or requirement can then be associated with the change-set as a result of the commit operation/event taking place.
And voila! We have almost automatically taken care of much of the traceability burden for that requirement! If we had an IDE that gave us a more seamless development environment integration and event/message passing with our change/task tracking tool and our version-control tool, and also the interface we use to edit code, models, requirements, etc., then:
- The IDE could easily know what kind of artifact we're working on (requirement, design, code, test)
- Operations in the IDE and the version-control tool would be able broadcast "events" that know my current context (task, artifact type, operation) and could automatically create a "traceability link" in the appropriate place.
- CASE tools and protocols like Sun's ToolTalk and HP's SoftBench tried to do this over a decade ago, but agile methods weren’t quite so formalized then or necessarily working in a TDD/TBD fashion. This is what Event-Based Traceability (EBT) is trying to help achieve. If we had the appropriate tool plug-ins (e.g., for Eclipse, or MS TeamSystem) and also used TDD/BDD with TBD in this IDE, we might just be able to get EBT for free! We may have at least come pretty close!
Search-based Traceability - Traceability without Tracing
Event-based traceability can be incredibly powerful and easy to use, as it eliminates most of the manual tracing would otherwise have to be created between artifacts and (meta)data across the multiple lifecycle phases and repositories. Within a set of phase-specific artifacts, a simple (or hi-tech) tool could be used that makes it very convenient to place linkages between items in the same artifact and/or of the same type These could include simple cross-references and item structuring/decomposition.
Search-based traceability is one of the more recent "hot topics" in traceability research. It does not require manual (or even automatic) insertion of traces between related pieces of information. Instead, it relies on smart, probabilistic and context-sensitive searching of a project's information, assets and data to dynamically infer linkages. These are then reported, based on the specific set of search keywords used.
Event and tool-integration-based tracing may still be more convenient and effective for relating information about who did what, when, and where. Search-based traceability can be more powerful, convenient, and effective at tracing information across the lifecycle and the value-chain, though. This is particularly true when it comes to answering questions like "Why?" and analyzing the impacts and relationships of logical items and terms across a project's knowledgebase. This also includes less formally structured information such as project blogs, mail-lists, wikis, etc.
In its most advanced form, search-based traceability utilizes information retrieval methods such as the vector space model, semantic indexing, or probabilistic network models to dynamically generate traces at runtime.
“The effectiveness of automated traceability is measured using the standard metrics of recall and precision, where recall measures the number of correct links that are retrieved by the tool, and precision measures the number of correct links out of the total number of retrieved links. Numerous experiments, conducted using both experimental data sets as well as industry and government data sources, have consistently shown that when recall levels of 90-95% are targeted precision of 10-35% is generally obtainable. [Recall indicates how many of the items that should have been found, are effectively extracted. Precision indicates how many of the extracted items are correct. Usually there is a trade off of recall against precision.]This means that automated traceability methods require a human analyst to manually evaluate the candidate links returned by the tool and to filter out the incorrect ones. Automated trace retrieval, while no silver-bullet, is increasingly recognized by industry as a potential traceability solution. Prototype tools such as Poirot are used in industrial pilot studies. The new Center of Excellence in Traceability has been established specifically to address these issues.”
Simple Search-based Traceability ("Just Google It!")
We needn't wait for prototypes to develop further in order to take advantage of some of the powerful ideas in search-based traceability. Available search-engines can be used today with the same kind of simple-tools previously mentioned in this article. This is particularly true when they are used in conjunction with Wiki-webs, source code and document repositories, as well as project mail-lists and blogs.
The minimum requirement determining the relevant terms or project vocabulary that will yield meaningful search results across the project or related projects. Wiki-webs are well-suited for defining project glossaries and terms, both as definitions and also as documentation of key concepts, patterns, and domain-specific terminology. If some discipline and conventions/standards can be applied regarding the consistent communication and use of such terms and vocabulary, then using a sensible yet simple combination of all the other mechanisms above can contribute to a lean yet practical traceability solution that covers all or most of a project's traceability requirements.
Traceability should introduce no friction to the development process, particularly if it is to win over agilists. Focusing on task-based and test-driven development, continuous integration, single-piece flow, and single-source information with a minimum of intermediate artifacts can make it easier to use with many tools.
Combining basic version-control tool integration with build & test automation (with event logging, notification, and subscription) can automatically log and track tracing information from these activities, which can then be readily queried or scripted to produce necessary traceability reports.
A simple wiki-mechanism (such as Trac, FIT, FitNesse), to define and organize project terms/concepts, use-cases or requests, and related project content can go a long way toward achieving single-sourcing of information with appropriate linkages for subsequent querying & reporting. Use of readily available search-engines on an existing project's knowledgebase (project wikis, blogs, mail-lists, code, specs, docs, tests, models, etc.), along with consistent use of a project's terminology, can fill in many of the blanks for tracing across the lifecycle and its artifacts.
Promising approaches, such as event-based traceability, along with more sophisticated information-retrieval methods, can help automate this and, indeed, have been implemented in several tools, thus raising the bar for the industry in this area.