|
Enhancing CM Tools with Triggers |
|
|
|
|
Monday, 02 October 2006 |
There are two ways of enhancing CM tools: wrappers and
triggers. Wrappers "wrap" the execution of Command Line calls or API calls, and
triggers are invoked from within the CM tools themselves via hooks or API
calls. Wrappers are "old school" and they work, but they generally only block a
command or to do post-command processing. Triggers, since they are invoked from
within the tools themselves, allow a finer granularity - especially when
dealing with macro-level commands. The rest of this article will concern itself
only with triggers though the general concepts will work for wrappers as well.
There are
three basic types of triggers:
- Blocking
- Pre-processing
- Post-processing
Blocking triggers evaluate criteria to determine if the
specified actions should be allowed to proceed or not. As an example, prevent
check-ins if a revision comment is not present.
Pre-processing triggers often set up or initiate other
actions prior to the triggering action being executed. As an example, add the
filespec (directory, filename and revision) of a file being checked in to one
or more Change Set lists based on information parsed from the revision comment.
Post-processing triggers deal with the results of an
action, often extending the effects of that action. Not all tools support
failure mode post-processing triggers so often pre-processing operations are
done here instead since an action's failure could not otherwise be "rolled
back." As an example, the Defect ID(s) parsed from a revision comment during a
check-in is used to update the appropriate record(s) in the DIET (Defect,
Issues and Enhancement Tracking) system.
One of the biggest penalties of using triggers is that
they take time to execute. Most of the time they are actually executed by a tool's
internal trigger mechanism and exist outside of the tool proper as either
scripts or custom executables. This means that every time a trigger fires the
associated code is loaded into memory from disk and executed. The time the code
is flushed from memory can be ignored as the tool never sees that time. In most
file systems, the deeper a file is within a directory hierarchy, the longer it
takes to load it (each directory in turn has to be opened and read in order to
determine where the next directory or file entry is on disk), so a good disk
cache is your friend. This time is compounded when an action is not atomic in
nature. For example, in both ClearCase and CVS check-in triggers are executed
once for each file and there is no corresponding trigger that is executed at
the beginning of a multiple-file check-in. This means that for ten files there
will be ten load-and-execute time penalties before the operation completes. And
this is assuming there is only one trigger involved.
So what
can one do to reduce this penalty? Some of the more common approaches are:
- If the tool is executing under a shell, write
the triggers in the same shell's scripting language. That way the language
itself will not have to be loaded in addition to the trigger script. This assumes
the scripts are executed reasonably fast and that the scripting language is
both robust and portable.
- Make the triggers stand-alone executables. The
downside of this is that they are not portable and if the triggers have to
execute on multiple platforms or OS's then multiple versions of each trigger
must be maintained.
- Use a reasonable fast platform-neutral scripting
language such as Perl or Python and try to keep the language itself loaded into
memory. This last is not possible on all OS's, but where it is possible it has
great payback.
- Keep the directory tree where the triggers are
repositoried shallow.
There is one more thing one can do: plan your triggers and
their implementation just like it was a real-time application. Blocking triggers
must execute fast, so they should be
as short as possible with the minimum of external interactions. No reading or
writing to external files unless absolutely necessary. Ditto on communicating
with other systems, regardless of the mechanism. If both pre- and post-processing
triggers are anticipated for a single action, see if they can be combined in
the post-processing trigger. And finally, separate out what needs to occur
immediately from what just needs to be done "sometime soon" from the pre- and
post-processing triggers and implement them as separate functions. The part
that is called from the tool's trigger mechanism executes as rapidly as
possible and queues up slower processing for a subsequent process to complete.
So how do
these secondary processes know to run?
- They can be connected to a task scheduler and
run on a periodic basis.
- They can be launched via a non-wait mode exec or
system function.
- They can be daemons that listen to a socket or
for a software signal.
The most reliable, though not always the fastest,
mechanism for a primary process to pass on the information to the secondary
process is via a file. This way, even if there is a system failure, the
information remains queued for subsequent "catch-up" processing. Other methods
include the passing of parameters (exec method), writing to named pipes (any
method) or writing to a socket (daemon method).
Each of these execution methods has its pros and cons. The
task scheduler method leaves one to the mercy of the scheduled intervals. This
may often be perceived as too slow. One often hears, "I want my DIET system to
reflect the state of my codebase immediately upon change. How else can I
effectively schedule testing?" The pros of this method are that it is one of
the simplest to implement, the code itself does not have to be exceptionally
fast and there is only one instance of it executing at a time.
The exec method solves the problem of immediacy of update,
but at the risk of not being able to complete its processing due to external
constraints (like the network being down) and not being able to let the
invoking tool know of the failure. It also does not lend itself to queuing
failures for later reprocessing. A final con is that there may be many instances executing at any one
time, so there are definite possibilities of running out of system resources
(what will the tool do if it cannot perform the exec?) and there is no way to
keep the operations chronologically in order. The biggest pro is the ease of
implementation. This method should only be used for triggers that are not
executed often.
And
finally, the daemon method has several pros:
- It maintains the chronological sequence of
events as it uses a queue mechanism.
- It starts executing when requested by the
primary process instead of waiting for the next scheduled period.
- It does not have to reside on the same system as
the tool that initiated the trigger (think security).
- It can be hooked to a scheduler in addition to
being started by the primary process, so if there were external failures
(again, such as the network being down) it can play catch-up.
The daemon method's biggest con is that it is the most
difficult of the methods to implement, especially if it is done correctly and
thoroughly tested. Don't pick this method for bragging rights; pick it because
it is appropriate to your needs.
Summary
Are triggers either necessary or desirable? Yes to both,
but they should be planned so that the minimum number of them is implemented.
Blocking triggers should be as fast as possible and not dependent on anything
located on another system. Pre- and post-processing triggers should be split
into a fast piece that queues up subsequent processing for a follow-on process
to compete and that secondary process should be implemented using the most
appropriate method as determined by the project needs and the skills of the
implementers.
Ben Weatherall is currently based in Fort Worth, Texas where
he practices Practical CM on a daily basis using a combination of CVS
and custom tools to support a modified Agile-SCRUM development
methodology. He is a member of IEEE, ASEE (Association of Software
Engineering Excellence – The SEI’s Dallas based SPIN Affiliate), NTLUG
(North Texas Linux Users Group), and PLUG (Phoenix Linux Users Group).
Trackback(0)
|