|
And then the DVCS arrived on the scene and turned the SCM world upside down. Contrary to what we could have expected, the transformation was not restricted to “disconnected work” and the now famous “pushing and pulling” operations. In fact the biggest change has been the impact on parallel development, thanks to the strong support for branching and merging included in all DVCS systems (both open source and commercial). But there’s still a lot to do to move acceptance of the DVCS beyond early adopters. What else can be done to help developers in their daily version control experience? Diffing and merging Today I’ll be talking about some of the trends in the next generation of diff and merge tools and how they can impact a developer’s day to day activities. Refactoring is the key What does refactoring have to do with version control? If we apply it at the file level (you know, a Java class lives in a file named after the class), ) it is quite clear that proper tracking of moves and renames is key to supporting simple things like renaming a class or locating it in a different module or package. The same applies for packages and directory trees . As simple as it sounds, good support for moved and renamed files and directories has been around for only a few years (unless you were using one of the high-end commercial SCMs), and was totally out of the scope of some of the widely used open source version control systems before the DVCS era. But today I’m thinking about operations performed inside a single file: you move a private method down in your class (following what you learned on Clean Code, for instance), you diff it with the previous version of your file, and the typical diff tool identifies two separate changes: one block added and another block removed. There’s no automated way to tell it’s really the same code. The following picture shows a similar example: a method has been moved up and detected as a “delete-add” pair of changes.
What if the diff tool was able to find that it’s really a single block of code has been moved? The result would be something like the following, where the developer is informed about the “move operation”:
The former implementation is based on a modified “Levenshtein distance” algorithm and, as such, is language agnostic, so it can only be considered as an initial step towards the goals I mentioned above. Refactoring can be harder The idea is not new -- in fact, it’s a typical question developers ask when they’re trained on a new version control system with better merge support: “hey, are you able to deal with C# or Java code specifically?”. In that sense, delivering a new generation of SCM able to handle diffs and merge conflicts based on “understanding” the underlying code would be simply “meeting customer expectations”. Note: the Eclipse IDE is already able to display an outline of the diffs between files showing which methods have been modified, added and so on. Displaying a language-aware diff My opinion is that we’ll end up with “combined views” where some sort of class/method layout will be combined with a traditional “text-based” diff, so that the developer can choose which view he wants to focus on. (This reminds me the UML class diagrams embedded in some development tools: they’re good to some extent, especially in communicating the “big picture”. But in the end, the code is the best “representation” (like in DSLs) of the inner workings of a class.) A simple visual outline like the following could help indicate that one method has been modified, one renamed and moved, one deleted, and a new method added.
Remember that a change as simple as this one can be extremely hard to follow with a conventional diff tool nowadays, if the code blocks are non-trivial (or not especially short). Then the “method2” differences could be easily expanded and most likely displayed in conventional “text diff” format:
Time to merge One of the easiest things a “language-aware” merge can do is to transparently handle added code. Here’s a simple scenario to illustrate the idea: you add a method at the end of the file and another developer adds a different method at the end of the same file. A conventional merge tool will detect it as a conflict, since the same line of code has been modified by two contributors. Obviously, such a conflict could be automatically resolved by a “language-aware” merge tool without any risk of error. Another simple example illustrates the power of this kind of merging. Let’s take a look at the following C# code:
One developer modified the “using” area by introducing a new “using” statement. The second developer added the same “using” at a different location. There’s no way to easily resolve this conflict with a conventional “text based” diff tool, but it would be trivial for a “language-aware” tool: it would go through the “using” statements, find a duplicate one, and keep just one of them -- all without user intervention. While this exampleis perhaps too simple, it shows the wide range of benefits that “code parsing” brings to the SCM arena. Shortcomings
Wrapping up About the Author Pablo got his degree on Computer Engineering from the University of Valladolid back in 2000. Since then he has worked on several projects and companies, always involved in software development. He worked for Sony Europe developing software for Sony's next-gen digital TVs (MHP). Pablo co-founded Codice Software in 2005, a start-up developing SCM software at global scale. Its product, Plastic SCM, is one of the most advanced Distributed Version Control Systems (DVCS) on the market. Pablo is an Associate Professor at Burgos University, focused on Project Management and specially interested on agile methodologies.
Set as favorite
Bookmark
Email this
Hits: 1282 Trackback(0)Comments (1)
|
|
... Hi Pablo! Take a look at some of the SCM papers (co)written by James J. Hunt during the late 1990s and early 2000s. He was writing a lot about language-aware merging and software support for extensible syntax-aware differencing/merging. |
|




Five or six years ago, the SCM arena was in a comfortable “status quo”, in which tools delivered only what developers expected (or even less) and innovation didn’t arrive at a quick pace.




