Migrating to Subversion

[article]
Summary:
In this article, we will look into the process of migrating from other version control systems to Subversion. Development teams decide to switch to Subversion for a wide variety of reasons. For many, SVN delivers new and desirable features which were lacking in their former system - usually because that system's design was much older than Subversion's. Such is the case with CVS, and for some commercial systems as well.

In this article, we will look into the process of migrating from other version control systems to Subversion. 

Why switch?

Development teams decide to switch to Subversion for a wide variety of reasons. For many, SVN delivers new and desirable features which were lacking in their former system - usually because that system's design was much older than Subversion's. Such is the case with CVS, and for some commercial systems as well.

The second most frequent reason is the total cost of ownership. With commercial systems, companies usually pay surprisingly high yearly fees to get support and upgrades. In times of intense competition and shrinking budgets, this expenditure can be hard to justify if you see that you can get the similar features for free with Subversion.

The last common migration reason, which usually combines with the two just mentioned, is that some systems offer complex features which, while not utilized by the team, still generate some extra work - by requiring users to do simple tasks in a complicated way, for example.

Version Control Babel and the Effect on Migration
Every version control system has its own approach to the problem domain, and it's own concepts and semantics. Some names have completely different meanings in different systems (checkout, for example), while similar concepts often have different names in different systems (like revision and version).

Moreover, each system usually has a more or less different style of working, or provides a different way of performing the same use case, compared to other systems. People not aware of this usually ask questions like "How can I do XXX with Subversion?" and are not prepared to get answer like "In your system you do XXX only as a part of YYY, but in Subversion YYY is done in a completely different way and therefore it doesn't make sense to do XXX."

For a successful migration, always keep the above diversities in mind and be prepared for the potential problems they can cause. The biggest problem - completely disqualifying SVN from consideration due to misunderstanding of the concepts - probably will not be the case with you if you have been following this series of articles (or if you will go back and check them out).

A second common problem faced in migration is when the innovator (the person or persons who did the research on version control systems and selected the Subversion as the alternative) needs to convince the rest of the team about SVN's suitability for the given purpose. The innovator speaks the languages of both Subversion and the old system, while the rest of the team usually speaks just the language of the old system.

Another common problem is failure to adjust the development process/workflow to fit the new repository. Sometimes the process was designed to fit some way of working imposed by one version control system. In such cases, the process will need some adjustments to completely utilize the features of Subversion.

Preparing minds
To avoid the complications described above, it is crucial not to underestimate the psychological aspects of the migration process. Change forces people out of their "comfort zone" - so it's a good idea to help people establish a new one by doing team training on Subversion features and concepts before the actual migration takes place, and by discussing beforehand all aspects of the change, including how to do things in the new environment. This kind of collective brainstorming activity can result in even more improvements than those suggested by the innovator alone.

During the time period between the training and the actual switch to SVN, the team members need to get used to the changes. Creation of a "sandbox" repository, where developers can safely explore the new environment can be a huge help in establishing a new comfort zone. As a bonus, it is possible to utilize the changes necessary to adopt the new system (SVN in our case) to heal some bad habits of the development team as well.

Preparing data
A typical company or development team has a huge amount of different kinds of data under version control. The gamut ranges from "old" projects, which are no longer being developed and which are not even accessible any more (because they were created using a legacy version of the version control system, for instance), to "obsolete" projects, which are not used any more, on up to the actual active projects.

The migration process itself costs something, and therefore it is usually not wise to migrate every single versioned file in the whole company. It's clear that to migrate any data, they must be readable. Therefore the first mentioned are out of scope. It can be helpful to classify existing data into three categories:

  1. Data that will not be migrated (such as legacy, unreadable data)
  2. Data for which only the HEAD (without history, tags, labels, etc.) will be migrated
  3. Data which will be fully migrated, including history, tags, labels, etc.

These categories can be based on frequency of their usage, frequency of modifications, and overall expected future use.

Migration of the second category is trivial - it means only exporting data from the old system into plain files and then importing those into SVN as any other new files from the file system.

Migration of the third category is usually the most complicated and time consuming part of the whole process, so let us take a close look at how you might approach it.

Migrating files with the history
This requires translation of data and concepts between different systems and therefore a special utility is needed. There are a number of them available online; most are able to process just one source system (like cvs2svn).

The most complete and mature utility seems to be SVN Importer (http://polarion.org/), which is an open source project sponsored by Polarion Software. SVN Importer can import data from CVS, PVCS, ClearCase, Visual Source Safe, and MKS. It's completely implemented in Java, and its modular structure allows possible addition of other source systems. The number of features varies for different source systems (CVS and PVCS being the most completely covered). New features and fixes are being continuously added, so coverage will improve with time. Even so, the migration process in not a simple and fast wizard-like procedure; it does require some tuning of the configuration for each individual case.

With some limitations, SVN Importer can also be used for incremental import of changes in a source repository, providing a way to keep SVN in sync with the another repository during a period of time.

md0706-1


SVN Importer Schema

The schema of SVN Importer's operation is shown on the picture. With exception of CVS, the importer always requires the source repository software to be installed on the computer, and it uses the command line tool to access the source data.

In the first conversion step, all source data are analyzed and the source repository model is created. Next, the source model is transformed into the Subversion model. This process is highly sensitive to inconsistencies in the source repository. Even small problems, which can remain unnoticed during normal operation, usually come to light here, since all data are traversed. Such problems can include invalid commit dates, corrupted storage data, etc. It's necessary to fix problems before proceeding with the import.

Once the SVN model is created, it is serialized into a Subversion dump file (a single file, used to re-create the whole SVN repository using the svnadmin tool). This is the most time consuming part of the import process, since the content of every version of every file is processed. The progress can be watched in the log file generated by the importer. Once the dump file is created, it can be (optionally) loaded into the SVN repository by the importer itself, or this task can be left to the user (which is the recommended setting).

Complete documentation of SVN Importer features is bundled in the binary distribution. A description of all the options can be found in the default configuration file. Both binary and source distributions are available for download from project page at http://polarion.org/. The project page also includes a project forum, and a link to SVN repository with sources, tracker and more.

Links

Subversion Book - http://svnbook.com/

Subversion Home Page - http://subversion.tigris.org/


Michal Dobisek (michal.dobisek AT polarion.com) is Software Architect at Polarion Software (http://polarion.com/). He has experience with CVS, Perforce and Subversion. He has two years experience in using, administering and tweaking Subversion. He holds a Masters degree in Cybernetics from Gerstner Laboratory of the Czech Technical University in Prague.

About the author

CMCrossroads is a TechWell community.

Through conferences, training, consulting, and online resources, TechWell helps you develop and deliver great software every day.