Back in March 2006 I wrote about forcing GNU Make to rebuild targets when commands change. Another common request is to rebuild when the contents of a file change, and not just its time stamp.
This usually comes up because the timestamps on generated code, or in code extracted from a source code control system, are older than related objects and hence GNU Make does not know to rebuild the object. This can occur even though the contents of the file are different from the last time the object was built.
A common scenario is that an engineer working on a build on their local machine rebuilds all objects and later gets the latest version of source files from source code control. Some source control systems set the timestamp on the source files to the timestamp set when the file was checked in; in that case the newer object files will have timestamps that are later than then, potentially changed, source code.
In this article I show a simple hack to GNU Make to cause it to do the right thing when the contents of a source file change.
A Simple Example
The following simple Makefile builds an object file foo.o from foo.c and foo.h using the GNU Make built in rule to make a .o file from a .c.
foo.o: foo.c foo.h
If either of foo.c or foo.h are newer than foo.o then foo.o will be rebuilt.
If foo.h were to change without updating its timestamp then GNU Make would do nothing. For example, if foo.h were updated from source code control, this Makefile might do the wrong thing.
To work around this problem what's needed is a way to force GNU Make to consider the contents of the file and not its timestamp. Since GNU Make can only handle timestamps internally, we need to hack the Makefile so that file timestamps are related to file contents.
Hashing File Contents
An easy way to detect a change in a file is to use a secure hash function, such as MD5, to generate a hash of the file. Since any change in the file will cause the has to change, just examining the hash will be enough to detect a change in the file's contents.
To force GNU Make to check the contents of each file we'll associate a file with the extension .md5 with every source code file that we want to test. Each .md5 file will contain the MD5 checksum of the corresponding source code file.
In the example above source code files foo.c and foo.h will have associated .md5 files foo.c.md5 and foo.h.md5. To generate the MD5 checksum we can use the md5sum utility which outputs a hexadecimal string containing the MD5 checksum of its input file.
If we arrange that the timestamp of the .md5 file changes when the checksum changes then GNU Make can check the timestamp of the .md5 file in lieu of the actual source file.
In the example, GNU Make would check the timestamp of foo.c.md5 and foo.h.md5 to determine whether foo.o needs to be rebuilt.
The Modified Makefile
Here's the completed Makefile with MD5 checksum checking:
to-md5 = $1 $(addsuffix .md5,$1)
foo.o: $(call to-md5,foo.c foo.h)
@$(if $(filter-out $(shell cat $@ 2>/dev/null),$(shell md5sum $*)),md5sum $* > $@)
The first thing to notice here is that the prerequisite list for foo.o has changed from foo.c foo.h to $(call to-md5,foo.c foo.h). The to-md5 function defined in the Makefile adds the suffix .md5 to all the file names in its argument. So after expansion the line reads foo.o: foo.c foo.h foo.c.md5 foo.h.md5. This tells GNU Make that foo.o is to be rebuilt if either of the .md5 files is newer, as well as if either of foo.c or foo.h is newer.
To ensure that the .md5 files always contain the correct timestamp they are always rebuilt. Each .md5 file is remade by the %.md5: FORCE rule. The use of the empty rule FORCE: means that the .md5 files are examined every time.
The commands for the %.md5: FORCE rule will only actually rebuild the .md5 file