How Audit Trails and Traceability Mitigate Risk


Traceability doesn't prevent errors and an audit trail does little to help me to recover from one. Does this mean they aren't valuable CM tools? On the contrary, audit trails and traceability are two of our most important CM tools for learning how to mitigate risk.

Any time we make a change to production, there is the risk that something will go wrong. Risk is a fact. The only way to eliminate risk is to never do anything. (But of course, that is not an option.) So we are left with actions we can take to mitigate those risks. Configuration management (CM) is mainly about risk mitigation. Everything we do in CM is designed to reduce the likelihood that things will go wrong, or to reduce the adverse impact when they do go wrong. CM helps us to avoid many failures, and to recover quickly from those few that we can't avoid. But how do audit trails and traceability fit into this picture? Traceability doesn't prevent errors, and an audit trail does little to help me to recover from one. Does this mean they aren't valuable CM tools?

On the contrary, audit trails and traceability are two of our most important CM tools for learning how to mitigate risk.

Answering Six Questions

Both auditing and traceability are mechanisms that are designed to answer the six questions: who, what, when, where, why, and how. For example:

  • Who made the change? Who authorized it? Who knew about it?
  • What exactly was changed? And what was not changed?
  • When was the change made (especially relative to other activities)?
  • Where was the change made (e.g. what platform, repository, location)?
  • Why was the change made? What triggered the change or motivated the person?
  • How was the change made? What precisely was done and not done? How was information about the change captured and communicated?

Although it is tempting to use these six questions to place blame, doing so can actually be counterproductive! It is better to use this information to learn about how we handle changes and what we can do to make them go more smoothly in the future.

Learning from "Who?"

The "who" of change primarily revolves around the availability of pertinent information. Were the "right" people involved? Did the person who made the change have the information that was necessary to make an informed choice about doing it? The same question can be asked about the person who approved the change.

And because any change can impact on other activities, a similar question can be posed. Did the people to whom this change was pertinent know about it when they needed that information?

Learning from "What?"
The "what" of change points us to issues of completeness. Were changes made to all of the things that should have been changed? What things were missed, and why were they missed?

The converse concerns changing more than should have been changed. Was the changer overzealous? Did he or she slip in other changes that were not authorized?

Learning from "When?"
The "when" of change deals with synchronization. Were there activities that should have been completed before the change was made? Or were there activities that should not have been done before the change? Did this change affect other changes?

The "when" may also deal with the change relative to the business use of the system that was changed. Was there some business activity that should not have been interrupted to make the change? Or was there a reason to postpone the change until after a special event (like month-end closing)?

Learning from "Where?"

The "where" of a change leads us to distribution issues. Was the change made in all of the places that it should have been made? Was it distributed to all of the people who should have received it? And was it timely installed in each place?

Conversely, were there places where the change should not have been made? Did it go beyond its intended target? How far a field did it go?

Learning from "Why?"
The "why" of a change can be the most important question. What are the reasons behind the change, and what might have been the argument against making it? Were all of the pertinent facts considered? Did the right questions get asked? Were the risks understood by those who decided to make the change?

Learning from "How?"

Finally, people make mistakes. Was the failure simply a human one? Did the person skip steps, do things incorrectly, or cut corners? Did the people have the requisite skills and training to be able to do the right things? Do they understand why the change procedures exist and the value of following each step? Do they have the tools to help them to be consistent and efficient as they do their parts?

Learning From the Six Questions

If we are not going to use this information to place blame, then what will we use it for? To learn!

While we should strive to never make a mistake, we must always view mistakes as an opportunity to improve. At least one of the six questions (likely more than one) will point us to shortcomings in our CM processes. Perhaps some procedures need to be more specific than they are. Maybe someone who should be involved was overlooked. Training may be called for. Perhaps some checks and balanced are needed.

The more information we have about the problem at hand, the more we can learn from it, and the better our improved process will be. Which brings us back to the necessity of audit trails and traceability. These are our primary source of this goldmine of information. With little or no information, we are left with people's memories and imaginations to identify the root causes of problems. With hard data, a clear path to an improved process is likely to emerge.

So in the end, audit trails and traceability really are risk mitigation mechanisms. Although they don't directly help us to avoid risks, they do position us to learn from each failure, and hopefully avoid repeating it in the future!

About the author

CMCrossroads is a TechWell community.

Through conferences, training, consulting, and online resources, TechWell helps you develop and deliver great software every day.