| 
Enhancing CM Tools with Triggers PDF Print E-mail
Monday, 02 October 2006

There are two ways of enhancing CM tools: wrappers and triggers. Wrappers "wrap" the execution of Command Line calls or API calls, and triggers are invoked from within the CM tools themselves via hooks or API calls. Wrappers are "old school" and they work, but they generally only block a command or to do post-command processing. Triggers, since they are invoked from within the tools themselves, allow a finer granularity - especially when dealing with macro-level commands. The rest of this article will concern itself only with triggers though the general concepts will work for wrappers as well.

There are three basic types of triggers:

  • Blocking
  • Pre-processing
  • Post-processing

Blocking triggers evaluate criteria to determine if the specified actions should be allowed to proceed or not. As an example, prevent check-ins if a revision comment is not present.

Pre-processing triggers often set up or initiate other actions prior to the triggering action being executed. As an example, add the filespec (directory, filename and revision) of a file being checked in to one or more Change Set lists based on information parsed from the revision comment.

Post-processing triggers deal with the results of an action, often extending the effects of that action. Not all tools support failure mode post-processing triggers so often pre-processing operations are done here instead since an action's failure could not otherwise be "rolled back." As an example, the Defect ID(s) parsed from a revision comment during a check-in is used to update the appropriate record(s) in the DIET (Defect, Issues and Enhancement Tracking) system.

One of the biggest penalties of using triggers is that they take time to execute. Most of the time they are actually executed by a tool's internal trigger mechanism and exist outside of the tool proper as either scripts or custom executables. This means that every time a trigger fires the associated code is loaded into memory from disk and executed. The time the code is flushed from memory can be ignored as the tool never sees that time. In most file systems, the deeper a file is within a directory hierarchy, the longer it takes to load it (each directory in turn has to be opened and read in order to determine where the next directory or file entry is on disk), so a good disk cache is your friend. This time is compounded when an action is not atomic in nature. For example, in both ClearCase and CVS check-in triggers are executed once for each file and there is no corresponding trigger that is executed at the beginning of a multiple-file check-in. This means that for ten files there will be ten load-and-execute time penalties before the operation completes. And this is assuming there is only one trigger involved.

So what can one do to reduce this penalty? Some of the more common approaches are:

  • If the tool is executing under a shell, write the triggers in the same shell's scripting language. That way the language itself will not have to be loaded in addition to the trigger script. This assumes the scripts are executed reasonably fast and that the scripting language is both robust and portable.
  • Make the triggers stand-alone executables. The downside of this is that they are not portable and if the triggers have to execute on multiple platforms or OS's then multiple versions of each trigger must be maintained.
  • Use a reasonable fast platform-neutral scripting language such as Perl or Python and try to keep the language itself loaded into memory. This last is not possible on all OS's, but where it is possible it has great payback.
  • Keep the directory tree where the triggers are repositoried shallow.

There is one more thing one can do: plan your triggers and their implementation just like it was a real-time application. Blocking triggers must execute fast, so they should be as short as possible with the minimum of external interactions. No reading or writing to external files unless absolutely necessary. Ditto on communicating with other systems, regardless of the mechanism. If both pre- and post-processing triggers are anticipated for a single action, see if they can be combined in the post-processing trigger. And finally, separate out what needs to occur immediately from what just needs to be done "sometime soon" from the pre- and post-processing triggers and implement them as separate functions. The part that is called from the tool's trigger mechanism executes as rapidly as possible and queues up slower processing for a subsequent process to complete.

So how do these secondary processes know to run?

  • They can be connected to a task scheduler and run on a periodic basis.
  • They can be launched via a non-wait mode exec or system function.
  • They can be daemons that listen to a socket or for a software signal.

The most reliable, though not always the fastest, mechanism for a primary process to pass on the information to the secondary process is via a file. This way, even if there is a system failure, the information remains queued for subsequent "catch-up" processing. Other methods include the passing of parameters (exec method), writing to named pipes (any method) or writing to a socket (daemon method).

Each of these execution methods has its pros and cons. The task scheduler method leaves one to the mercy of the scheduled intervals. This may often be perceived as too slow. One often hears, "I want my DIET system to reflect the state of my codebase immediately upon change. How else can I effectively schedule testing?" The pros of this method are that it is one of the simplest to implement, the code itself does not have to be exceptionally fast and there is only one instance of it executing at a time.

The exec method solves the problem of immediacy of update, but at the risk of not being able to complete its processing due to external constraints (like the network being down) and not being able to let the invoking tool know of the failure. It also does not lend itself to queuing failures for later reprocessing. A final con is that there may be many instances executing at any one time, so there are definite possibilities of running out of system resources (what will the tool do if it cannot perform the exec?) and there is no way to keep the operations chronologically in order. The biggest pro is the ease of implementation. This method should only be used for triggers that are not executed often.

And finally, the daemon method has several pros:

  • It maintains the chronological sequence of events as it uses a queue mechanism.
  • It starts executing when requested by the primary process instead of waiting for the next scheduled period.
  • It does not have to reside on the same system as the tool that initiated the trigger (think security).
  • It can be hooked to a scheduler in addition to being started by the primary process, so if there were external failures (again, such as the network being down) it can play catch-up.

The daemon method's biggest con is that it is the most difficult of the methods to implement, especially if it is done correctly and thoroughly tested. Don't pick this method for bragging rights; pick it because it is appropriate to your needs.

Summary
Are triggers either necessary or desirable? Yes to both, but they should be planned so that the minimum number of them is implemented. Blocking triggers should be as fast as possible and not dependent on anything located on another system. Pre- and post-processing triggers should be split into a fast piece that queues up subsequent processing for a follow-on process to compete and that secondary process should be implemented using the most appropriate method as determined by the project needs and the skills of the implementers.


Ben Weatherall is currently based in Fort Worth, Texas where he practices Practical CM on a daily basis using a combination of CVS and custom tools to support a modified Agile-SCRUM development methodology. He is a member of IEEE, ASEE (Association of Software Engineering Excellence – The SEI’s Dallas based SPIN Affiliate), NTLUG (North Texas Linux Users Group), and PLUG (Phoenix Linux Users Group).
Trackback(0)
Comments (0)add comment

Write comment
smaller | bigger

security image
Write the displayed characters


busy
 
< Prev   Next >
If you already have an account on CM Crossroads, Login Now. If you do not, register using the link below...

NOTE: Once you register you will need to activate your account by clicking the link sent to you by email.

Video News