|
Although 3-way merge tools have been available for a long time, their trustworthiness has been insufficient for a long time as well. This results in development environments with: a) concurrency forbidden, b) "gotta-do-a-merge" blues, and/or c) additional CM staff to manually review all merge results. All of these alternatives are productivity killers and feed the black hole where all theoretical ROI goes. This paper describes an advanced 3-way merge solution. First, it discusses the many problems that have plagued 3-way merge implementations -- test case kits are provided to detect their existence. Then, it shows how solutions for these problems were designed and implemented in a working product. Introduction Tools which support concurrent development are more important today than ever before. Open Source products, Offshore/Outsource development and maintenance, customer customization of packaged application's procedures and/or metadata are among the most challenging concurrent development environments. While a great deal of discussion about collaboration, XP, and Agility seems to assume concurrency is practical and productive, very little discussion has occurred regarding these other concurrent environments which are NOT collaborating. They are not collaborating in the sense that developers making changes are for the most part unaware of each other. And, another significant variance is the frequency of synchronization --- collaborating teams will tend to synchronize often while these other more challenging types of concurrent environments will tend to synchronize infrequently. In collaborating environments, changes are sometimes shared and then improved upon. These parallel step-wise refinements can pose puzzling synchronization pattern challenges when they're "re-merged". Any paradigm supporting concurrent development depends upon a trustworthy 3-way merge to automate the synchronization process. In collaborating environments, the frequency of synchronization and team awareness tend to minimize the concurrency problems which may occur. However, 3-way merges are used to automate the process, regardless of the Branching Model [4] environment. 3-way merges take the original version of a file and two separate versions, and merge both sets of changes back into a new file. The changes of each separate version are determined by comparing each version with the original. The 3-way merge will automatically apply all the changes (which are not overlapping) from each version. Then, if any changes were overlapping, those are brought to the attention of the merge operator for resolution. ![]() SCM systems such as SCCS [2] and RCS [3] (plus dozens more based upon these tools) supporting concurrent development models have existed for a long time. Although 3-way merge tools have been available for a long time, their trustworthiness has been insufficient for a long time as well. These problems can cause development environments to forbid concurrency --- which is not always possible and counter-productive. When concurrency is supported in the environment, synchronization can become a dreadful, depressing, counter-productive process for everyone involved. In some environments the synchronization process is "managed" by the CM staff. This isolates the pain, but requires additional CM resources. This paper describes an advanced 3-way merge solution. First, it discusses the many problems that have plagued 3-way merge implementations. Then, it shows how solutions for these problems were designed and implemented in a working product, Guiffy SureMerge. Test case kits are provided to detect the problems discussed and demonstrate the SureMerge solutions. It concludes with some observations on the benefits of a trustworthy merge tool. "Conflict" Definition 3-way merges look for "conflicts" defined as changes that overlap. However, implementations vary on what is defined as an overlapping change. Some implementations do NOT treat lines inserted at the same point in both version files as a conflict. Most of the time this may not be a problem, but you would like to know and perhaps switch the order of inserted changes. On the other hand, if the inserted line(s) in both versions are defining a new indexed value (i.e., a function number, message type, or error code, etc.), both versions will be adding a new index definition with the same value. If these insertions are automatically merged, the merged code will probably build properly, but one of the new indexed items will NOT work. Because there is some central code like this in almost every product, it is important for any trustworthy 3-way merge to consider inserts as potential conflicts. The most dangerous deficiency found in some 3-way merges results in losing one of the version's inserted changes. The {1} Test Case Kit can be used to evaluate how a 3-way merge handles inserts at the same location. Pathological Cases The following scenarios pose challenging exceptions which are known to confuse some 3-way merges. Sometimes, the 3-way merge identifies the conflict and non-conflicts satisfactorily, but it just looks confusing. In the worst case scenario, the 3-way merge will become confused --- not identify a conflict, lose a change, or auto-merge the wrong change. BOF/EOF Stingers For a variety of "reasons", 3-way merges can get confused by conflicting changes at the beginning or end of the files. Some 3-way merges post-process their 3-way minimum lines of differences with heuristics in order to improve their appearance or avoid problems caused by shifting unique anchors. If they reach the beginning of the file while looking back or reach the end of file while looking forward, they can "give up". The {2} Test Case Kit can be used to evaluate how a 3-way merge handles changes in both versions at the EOF. Identical Twins Most 3-way merges have been enhanced to recognize identical changes and not call them a conflict. If the tool's merge result view is in reverse, though, such changes are usually visible and can be distracting. 3-way reverse compare-merge result views may seem easier to understand in simple cases. However, these reverse views (displaying all changes by both versions to the original) become so confusing for complicated change patterns that users often quit the merge tool and do the job by hand. Using a 3-way compare reverse view for merging is like driving down an Exit ramp in reverse.The {3} Test Case Kit can be used to evaluate how a 3-way merge handles identical changes and how they appear. X-Tuplets This change pattern usually started as identical twin changes shared in both versions. Then, some further revisions are made in one or both versions. As a result, the identical change block contains a (subset) change. Furthermore, it may even appear to be two or more subset changes within the area of the original identical change when comparing version 1 with version 2. The {4} Test Case Kit can be used to evaluate how a 3-way merge handles such change patterns. Hungry Blobs This change pattern includes separate changes in each version which are close or adjacent to each other, resulting in one combined SureMerge "Attention". Such changes will usually not be detected as a "conflict" in other 3-way merges and will be auto-merged. The {5} Test Case Kit can be used to evaluate how a 3-way merge handles such change patterns. SureMerge "Attentions" --- Beyond "Conflicts" Guiffy's SureMerge considers changes of any type (NOT just lines changed). Likewise, SureMerge's "Attention Focus" goes beyond looking for conflicts and catches changes which touch one another (but don't overlap) or changes which are in close proximity to one another. Minimum Blocks of Difference Guiffy's SureMerge is based upon a special proprietary algorithm designed for the problems inherent in smart merging. Other 3-way merges just apply the published compare algorithm [1] which they use 3 ways. Applying these "unique anchor" minimum lines of difference algorithms 3 ways is the fundamental cause of most of the pathological problems discussed above. Indeed, the authors of those algorithms have acknowledged that the "the 3-way merge is problematic and was not considered when designing the compare algorithm". In other words, it will not work sometimes, and any post-processing attempts will be "problematic" (not work in some cases). 3-Way Auto Focus SureMerge's Minimum Blocks of Difference algorithm tends to express the differences in fewer change blocks (by treating separate changes close to each other as one change block). This naturally results in expanded Attention Focus areas depending upon the size of the change. Moreover, a user control option is provided to expand the Attention Focus (if you want to be extra sure). This screen shows how a typical 3-way merge would auto-merge two changes (one from each version) without seeing a "conflict". ![]() And, this screen shows how SureMerge's Auto Focus would identify the changes as an "Attention". Just one click of a button would keep the first two lines from the first version or the last two lines from the second version. Conclusion The problems discussed in this paper have plagued 3-way merge tools for many years resulting in the general warning from experienced veterans to "be careful when you do that merge". To avoid these hazards, some folks resort to forbidding concurrent development completely or declare temporary periods of "code freezes", which can last for weeks or months while CM/QA/Development do the release "rain dance". In other environments concurrency is allowed, but everyone faces the dreaded merge process. In larger organizations, additional CM staff are sometimes tasked with performing and/or manually reviewing all the merges. Guiffy SureMerge avoids all these hazards and provides a trustworthy 3-way merge capability. It can make the difference between forbidding or embracing concurrent development. It can eliminate the "dreaded merge blues". It can shorten the time required for CM to package bug fix builds or synch up a new release. In today's world, "merge projects" to upgrade customizations with a new Open Source or packaged product releases can take a team weeks to complete without a trustworthy 3-way merge. Teams using Guiffy SureMerge have completed merge projects in half the time (sometimes less). Test Case Kits Each test case has 3 files: the two versions and the parent (common ancestor or original). The files are named with extensions .1st, .2nd, and .parent. To aquaint yourself with each test case, begin by comparing the parent and 1st, then compare the parent and 2nd. Then, perform the 3-way merge. {1} Conflict Definition and Inserts: conins.zip This kit includes two test cases for lines inserted in each version at the the same point. The FlownetController.cpp files are an example from the real world of two concurrent changes adding different code at the same locations. The ErrorMsg.java files are an example of two concurrent changes adding error message types with the same value. {2} BOF/EOF Stingers: eofsting.zip This test case includes changes at the Beginning and End of File (BOF/EOF) in each version. In both the 1st and 2nd Misc.java.versions code is added at the beginning and end of the file. {3} Identical Twins: idtwins.zip This test case includes identical changes in each version. In both the 1st and 2nd About.java versions line 7 in the parent was changed and a line was added after that. These two lines are identical in 1st and 2nd. The 2nd version has an additional change --- it //ed several lines near the end. {4} X-Tuplets: tuplets.zip This test case includes a similar change in each version that was shared to begin with, then slightly modified, resulting in 2 conflict/Attentions. In both the 1st and 2nd getCLIArgs.java versions a change block added the code for processing the -fb and -fe arguments. Then, in the 2nd version, improvements are made within that code. Other changes are made in both versions to be realistic and provide addtional test validation points. {5} Hungry Blobs: blobs.zip This test case includes separate changes in each version which are adjacent to each other, resulting in one combined "Attention". In the RearViewMirror 1st version line 7 is changed to add an if condition. In the RearViewMirror 2nd version line 6 is changed to add the sane condition in an exsiting if statement. SureMerge's Auto Focus identifies the changes as an "Attention". Just one click of a button would keep the first version or the second version. References [1] Eugene W. Meyers, An O(ND) Difference Algorithm and its Variations. Algorithmica, Vol. 1 No. 2, 1986. [2] Mark Rochkind, The Source Code Control System. IEEE Transactions on Software Engineering, Vol. SE-1, December 1975. [3] Walter Tichy, RCS -- A System for Version Control. Software -- Practice and Experience, July 1985. [4] Chuck Walrad and Darrel Strom, The Importance of Branching Models in SCM. IEEE Computing Practices, Vol. 35, September 2002. Bill Ritcher is the Founder, President, and CEO of Guiffy Software, Inc. and author of the Guiffy SureMerge product. Prior to founding Guiffy Software, Bill was the Director of Development at Ross Systems where he was responsible for the Gembase 4GL product. In the 1980s Bill Co-Founded Quintessential Solutions, Inc. and as Vice President of Development authored several least-cost, tariff driven, wide area network design products. A software developer and manager since the early 1970s, Bill holds a B.S.B.A. degree in Systems Management from Rockhurst University.
Set as favorite
Bookmark
Email this
Hits: 10642 Trackback(0)Comments (0)
|
| Last Updated on Sunday, 05 August 2007 15:47 |



Using a 3-way compare reverse view for merging is like driving down an Exit ramp in reverse.

