Rebuilding When a File's Checksum Changes

[article]
Summary:

In this article, Ask Mr. Make shows a simple hack to GNU Make to cause it to do the right thing when the contents of a source file change.

A common scenario is that an engineer working on a build on their local machine rebuilds all objects and later gets the latest version of source files from source code control.  Some source control systems set the timestamp on the source files to the timestamp set when the file was checked in; in that case the newer object files will have timestamps that are later than then, potentially changed, source code.

In this article I show a simple hack to GNU Make to cause it to do the right thing when the contents of a source file change.

A Simple Example

The following simple Makefile builds an object file foo.o from foo.c and foo.h using the GNU Make built in rule to make a .o file from a .c.

.PHONY: all
all: foo.o

foo.o: foo.c foo.h

If either of foo.c or foo.h are newer than foo.o then foo.o will be rebuilt.

If foo.h were to change without updating its timestamp then GNU Make would do nothing.   For example, if foo.h were updated from source code control, this Makefile might do the wrong thing.

To work around this problem what's needed is a way to force GNU Make to consider the contents of the file and not its timestamp.  Since GNU Make can only handle timestamps internally, we need to hack the Makefile so that file timestamps are related to file contents.

Hashing File Contents

An easy way to detect a change in a file is to use a secure hash function, such as MD5, to generate a hash of the file.  Since any change in the file will cause the has to change, just examining the hash will be enough to detect a change in the file's contents.

To force GNU Make to check the contents of each file we'll associate a file with the extension .md5 with every source code file that we want to test. Each .md5 file will contain the MD5 checksum of the corresponding source code file.

In the example above source code files foo.c and foo.h will have associated .md5 files foo.c.md5 and foo.h.md5.  To generate the MD5 checksum we can use the md5sum utility which outputs a hexadecimal string containing the MD5 checksum of its input file.

If we arrange that the timestamp of the .md5 file changes when the checksum changes then GNU Make can check the timestamp of the .md5 file in lieu of the actual source file. 

In the example, GNU Make would check the timestamp of foo.c.md5 and foo.h.md5 to determine whether foo.o needs to be rebuilt.

The Modified Makefile

Here's the completed Makefile with MD5 checksum checking:

to-md5 = $1 $(addsuffix .md5,$1)

.PHONY: all
all: foo.o

foo.o: $(call to-md5,foo.c foo.h)

%.md5: FORCE
    @$(if $(filter-out $(shell cat $@ 2>/dev/null),$(shell md5sum $*)),md5sum $* > $@)

FORCE:

The first thing to notice here is that the prerequisite list for foo.o has changed from foo.c foo.h to $(call to-md5,foo.c foo.h).  The to-md5 function defined in the Makefile adds the suffix .md5 to all the file names in its argument.  So after expansion the line reads foo.o: foo.c foo.h foo.c.md5 foo.h.md5.  This tells GNU Make that foo.o is to be rebuilt if either of the .md5 files is newer, as well as if either of foo.c or foo.h is newer.

To ensure that the .md5 files always contain the correct timestamp they are always rebuilt.  Each .md5 file is remade by the %.md5: FORCE rule.  The use of the empty rule FORCE: means that the .md5 files are examined every time.

The commands for the %.md5: FORCE rule will only actually rebuild the .md5 file

About the author

John Graham-Cumming's picture John Graham-Cumming

John Graham-Cumming is Co-Founder at Electric Cloud, Inc . Prior to joining Electric Cloud, John was a Venture Consultant with Accel Partners, VP of Internet Technology at Interwoven, Inc. (IWOV), VP of Engineering at Scriptics Corporation (acquired by Interwoven), and Chief Architect at Optimal Networks, Inc. John holds BA and MA degrees in Mathematics and Computation and a Doctorate in Computer Security from Oxford University. John is the creator of the highly acclaimed open source POPFile project. He also holds two patents in network analysis and has others pending.

CMCrossroads is one of the growing communities of the TechWell network.

Featuring fresh, insightful stories, TechWell.com is the place to go for what is happening in software development and delivery.  Join the conversation now!

Upcoming Events

May 04
May 04
Jun 01
Jun 24