|
There's no doubt about it. The easiest way to do software development
is at a single site. Technology has not changed that. However,
technology has made it a lot easier to do distributed development, even
when the sites are distributed half way around the world. Net
meetings, mirror sites, Mega-bit global web access, collaborative
software for reviewing documents and code - these all make it a lot
easier. What about the CM part of it? Why is it not quite as easy to
do globally? Well, when you get outside of the developer sandboxes,
there are a number of factors which complicate life. So we need to
look at what architecture is needed to do the job right.
Let's first agree that any personal issues such as communication, conflict resolution, politics, etc. are amplified tremendously in a global development project. It's always easy to pass the buck or point the finger half way around the world. And it's no different with global CM. However, let's not forget that a good CM tool is also a great communication tool. And it leaves traceability tracks all over the place so that we don't have to point fingers - we can just accept the facts. So we'll look at the architecture of the technology needed to ensure we have good tools for global development, and leave the organizational attributes for someone else to deal with. Global CM Architectures There are as many different global CM solutions as there are tools that support this capability. However, Global CM architectures can be characterized into three classes:
Global CM Factors Not surprisingly, there are quite a number of factors that have to be considered when selecting a global CM solution. Some of them are very architecture specific while others are just a fact of life. The list here is fairly complete, but I'm sure if you've tried a solution that wasn't successful, you'll have a few items to add - in fact, please do comment on this article on-line, and add the ones I've missed. Network latency: Latency will be an issue with any global solution. Round trip signaling across 8000 Km requires more than 1/20 of a second. And that's before we add in things like firewalls, encryption, routing and virtual private networks. Repeated confirmed communications during an operation can easily introduce significant delays. For example, if you have to run a script which is constantly querying remote data, there can be a delay of several seconds or even several minutes. Ideally, all of the data you need to access is local and the delays are minimal. Synchronization latency: If data is stored in more than one location, it is important that updates to the data are synchronized frequently or else that a strategy is used to partition the data in such a way the latency is not so important. For example, partitioning branches of a file so that only a specific site can work on a particular branch eliminates latency, but also minimizes some flexibility. Segregating products across sites would help, but that is not so much a global development scenario as a system integration issue. IDE integration: Some, and perhaps most, IDE tool integrations work with both the workspace and the CM repository data to ensure a consistent and up-to-date view of a "project". In this scenario, if the IDE has to go to a remote location to get CM data, there can be both significant delays and false views of the data. This in turn can cause access to the appropriate options and features to disappear (e.g. Check-Out will appear only if the IDE thinks it's not already checked out). Network Outages: A network outage can have a number of different effects on a solution. For example, how does the solution respond when it is synchronizing data? What happens to users when their clients can't see one or more other sites? How much effort is required to recover from the outages and how much down-time results? How sensitive is the solution to brief outages? Is there a separate game plan for extended outages of a few days, or even weeks (e.g. undersea cable severed)? Reliability/Points of Failure: The number of points of failure in a system can impact accessibility considerably. But it's also important to see how solutions react to outages. For example, if the repository of a central site solution is inaccessible, it's likely that access is disabled for everyone. Points of failure include both network links and serving nodes. If data is distributed across dozens of sites, most data may be accessible always, but some of the data may be inaccessible frequently, especially if each site has a scheduled down time (e.g. for maintenance, etc.). If reliability of the solution is poor with just two sites, multiplying the number of sites is certain to multiply the reliability problems. Performance and Network Bandwidth: All global solutions require a minimum bandwidth to be successful. This applies both to intensive synchronization operations and to large transactions (e.g. Video training and demo files). Does the solution allow for special treatment of very large files in the case of limited bandwidth. How well does it deal with network outages during transmission across low bandwidth links. Does the bandwidth affect the overall performance of the user interface, especially in a transaction-based system of a multiple site repository? Set-up: We might assume that all global solutions require some level of set-up. Perhaps this is not the case for a Central Site solution using a Web-only interface. But generally, when a new site is added, there has to be some initial data transfer to the site to establish a base point from which the solution operates. How is this data snapshot established? How quickly after the snapshot can the original site continue? How quickly does the new site have to be up and running after the snapshot, or after the data transfer is complete? Is it a gating factor to restoring normal operation at the other sites? How easy is it to eliminate a site? Or to take it off-line temporarily and restore it? Administration (including partition/synchronize): Administration of a global solution is often a reason for not attempting it. Yet some solutions are fairly light-weight on administration. How much work is required to partition data and to maintain the partitioning at the appropriate boundaries? How easy is it to adjust the partitions or move them from one site to another? How often is synchronization done? Is it automated? What happens if a problem crops up? How much effort is it to swap out hardware? What about upgrading the OS? ALM Coverage: A global solution should cover the entire life cycle. Often, the ALM solution is provided across multiple tools. Does this mean that you need separate global solutions - one per tool? And if so, do they work similarly? One particular area of concern is how does the solution cover source and documentation files as opposed to the rest of the data (e.g. file meta-data, problem reports, test data, etc.). Solutions that have grown up dealing primarily with files will typically need a parallel solution to deal with data. Upgrades: Upgrading tools can be a significant task, but even moreso with a global solution. There's coordination with the various sites to start with, and that's likely across time zones. Then there's co-ordination across tools of the ALM solution. Then there's the affect of the upgrade on the multiple sites. How long does it take (down time)? Can it be done from a single site? Do all upgrades have to be applied at all sites simultaneously or can different versions of the tools be running at different sites? Ideally a global solution has a global upgrade solution with it. Access to data: What type of data access restrictions does each solution have? Does it differ for each component of your ALM? If you only have access locally (i.e. at one site) to a portion of the data, can the rest of the data still be remotely queried? What data can be updated from which sites? How is this communicated to users and how can it be changed when necessary? If you don't have access to the entire set of source files for your product, how and where are your builds performed? Does this affect synchronization frequency? How long does it take for your updates to be visible at other sites? Data Segregation: Often it's desireable to segregate parts of your CM repository from other sites. The U.S. government's ITAR requires it. Normal contractor/sub-contractor privacy and security concerns require it. Yet, the bigger the picture available to each team member, the better the results. Often a multiple site solution is used to partition data between the contractor and sub-contractor. If this is a requirement, each solution will have it's advantages and disadvantages here. Roaming Between Sites: If your ALM tool is used by management and executives, it's likely that they'll want to be able to access them from any site (OK, ideally from anywhere period). Does this require an extra level of administration, or is it even possible to look at data one day in Dallas and continue reviewing it the next day in the London office? Does the data have to be exported and then synchronized again in order to support roaming. Do user permissions or user interfaces change as someone roams between sites? This is even more important when development team members visit other development sites. Workspace access: Access to the repository data is one thing, but what about access to your workspace data? If your workspace is fixed to a site, this can make roaming difficult. Just because it's on a thumb drive, it doesn't mean that it can function properly at another site without appropriate action. If you have partially populated workspaces (e.g. incremental), does the rest of the system reside locally so that you can do your builds? And if so, what happens if you move to a different site - is it still accessible? Backup Consistency: Backup consistency is often a problem with CM/ALM solutions, especially when separate tools provide each of the ALM components. But it's a lot worse if you're data doesn't reside all at one site. For this reason, if you're looking at a distributed data architecture, you'd better make sure that there's a backup strategy that supports it. One issue is simply getting a backup at a single (logical) point in time, across multiple sites. Another is dealing with backups while synchronization is going on. A third, and often overlooked issue is ensuring that the backup can be successfully restored so that all of your sites are back up and running. A related issue that affects more that one architecture is dealing with any administrative fix-ups to the repository or environment that might have to be repeated at each site to maintain synchronization and consistency. Disaster Recovery: Although not a requirement, some global CM solutions also provide disaster recovery capabilities. Especially if your solution addresses the entire ALM function, it is possible that your entire project IT will be disaster-proof because it is replicated elsewhere. This might be the case in a Distributed Data architecture where the main site is not used for doing updates (other than synchronizing the other sites to the main site). It might be the case with a Central Site architecture if it has a fairly frequent remote mirroring capability. A Multiple Site Repository architecture will normally have this capability. Still, in each of these architectures, it will take time to recover and it's possible that a restore from backup might be the quickest route. Monitoring Capabilities: A site can be across the world or across the ocean. If you have a solution that will run for a year or two in between issues, you have my permission to ignore this section. But most solutions are going to have some issues. It may be that the IP address suddenly changed in the middle of the day. It may be that a file showed up empty when it got to where it was going. It may be that the OS release 6.4.1.31 is not compatible with 6.4.1.32 in a specific area affecting your solution. It may be that you have an issue-prone solution. You will need to monitor the state of your global project at different times and for different reasons. If it's just to say that everything is going well - great, at least you'll sleep better. If you're upgrading and you need to make sure everything is working well, or if you're setting up a new site and want to verify it's working. It helps tremendously to have built-in monitoring capabilities. These might me as simple as remote log-ins to the sites around the globe so that you can look at the state of things. Even better, you have a way of asking for the state of the world and having it sent to your site. Perhaps you even have a regular email letting you know of any issues detected by the global CM audits. The more the merrier. It only takes one trip to prove that to you - unless you live in the cold north and you have to fly south in mid-winter. Customization: In a single site solution, you customize your tools to support your processes. A few data schema changes, some process workflow changes, permission changes, and of course a number of user interface customizations. In a multiple site solution, it's not always so easy. Do you have to make all of the changes to all of the sites? If not, how can you guarantee the results when the data is moved to another site? Does one person or team do the customizations or does each site have permissions to advance the same or different parts of the ALM function? Ideally, all customizations made at one site instantly appear at the other sites, with no interruption. But this is not easily accomplished, so you have to understand your customization limitations or risk withdrawing from your continuous process improvement program. Licensing Costs: What's it going to cost to go global? Well the other factors have given you some ideas. But then there's licensing. Some tools and some architectures will give it to you for free, and even save you operating costs, while others may effectively double the price of your solution. Open source looks good here - if all of the other problems are dealt with. But any Central Site solution and some other solutions are not expensive to move to global solutions from a licensing perspective. After all, the vendor is already getting paid to use your software, so why should you have to pay more just to say where you're going to use it. Time zones: Time zones are a key consideration of any global solution because they will affect schedules (e.g. daily builds) and communications. Perhaps your time zones are just perfect so that the East can update the repository from 6am to 6pm, while the West updates it in the other 12 hours. Some tools and some global solutions might let you take advantage of time zones. Others won't care at all. Whether or not this is a true factor in your case, you'll have to determine on your own. Training: If your solution requires developers and other team members to work differently, you'll have to add in some training. Maybe an hour, maybe a day. If it just requires remote access or similar access from a different site, you might get away with no training. Almost certainly, there will be administration training required, and possibly some CM training. The training might just be centred around how to minimize performance hits. Or it may involve the full suite of partitioning/synchronization issues along with backup consistency, recovery from a network outage and many of the other factors we've already dealt with. We've spent quite a bit of time identifying most of the key factors affecting global solutions. Hopefully this will help as we go through the architectures in more detail, taking a look at the pitfalls and benefits of each. Central Site, Remote Client The central site solution has a single central repository for all of the CM or ALM data, depending on the tools reach. All repository transactions are applied to the central repository and all data access is made through this central repository. To facilitate the global solution, a (typically thin) remote client is used to access the repository. There are multiple modes of operation possible for the remote client. Here are a few of them: (a) Use a workspace at the central site which is only viewed remotely (very thin client). This also includes the possibility of using a Web client. (b) Use a remote workspace with the central site. In this scenario, the client's IDE is locally resident and can operate on files until they are ready to be checked in. Other ALM functions are typically handled directly through the central site (e.g. problem reports, etc.), with a remote client used to provide forms for input/modification, and dialog panels for queries. Query results are either embedded in a web page result on the client machine or are shipped back (e.g. as files) to the remote client for inspection. (c) Use a remote workspace with the central site as above, and a thin client to present forms and queries, but use Mirroring technology to mirror the repository at the remote site (or closer to it). This allows quicker access to the data and has various robustness properties. The remote client solution seems rather clean. As long as you can communicate with the central site, you can perform your CM tasks, be they check-in, check-out, problem status, task assignment, whatever. Performance issues can be handled, to some extent, through the use of caches. This can be a complex undertaking however, so be certain that any caching capability is automated. Mirroring (i.e. replication for read access) is often a more robust caching strategy. Advantages:
Distributed Data The distributed data solution, commonly referred to as a (1st generation) multiple site (or multi-site) solution deals with selecting specific data to be distributed to one or more other sites. The data may be updated only at those sites, but are regularly synchronized back to the "main" site so that a consistent up-to-date picture is available. In some cases, the data may be updated at more than one site by creating a branch for each site, and then merging the branches shortly after (or as part of) the synchronization process. Of course in this scenario, synchronization can be a partially manual task and can result in some headaches, especially if the changes are made in a time zone far removed from the main site. These systems have evolved and can include a combination of distributed data (from a data ownership perspective) and mirroring (from a data access perspective). The key point is that the site which owns the data, has the right to update it, and typically has the most up to date version of the data, pending synchronization. Advantages:
Each developer, or each group could have one or more such partitions. So the change control and CM repository was essentially exported to a sub-group. The model was also used to do promotion levels, where all of the code was checked out of a production "library" to create an integration library, and all of the code checked out of the integration library to form a development library. Every so often, the integration library was promoted to (ie checked in to) production (after having several additional change packages applied to it), and the development library, when stable, was checked into the integration library. I do remember some problems during these major check-in operations, but for 30 years ago, it was fairly impressive. And we used it locally, not to address a global CM requirement. But it never became a commercial product and eventually was superceded when its scalability limit was reached at around a couple million lines of code. Don't forget, computers (i.e. Mainframes) were thousands of times slower back then. Multiple Site Repository A multiple site repository solution works by replicating, in "real time", the repository at multiple sites. This is most easily done by sending all repository transactions to all sites and ensuring that they are applied at each site in the same order. This means that there are one or more servers at each site which are both managing transaction traffic and running the transactions against the CM/ALM repository. In this scenario, as in the Central Site scenario, there is no need to partition or synchronize data. All data is available to all users. As will the other scenarios, all of the ALM components must support this scheme or you have a hybrid set of multiple site solutions. In theory, administration should be light, as the same transactions are processed at all sites. In practice, I would want some monitoring tools to ensure that the sites are indeed running in sync, especially if the networks are unreliable. One of the nicest aspects of this solution is that, if implemented properly, it looks to each user as if it is a single Central Site solution. At any site the user interface is the same because all of the data is the same. The user only needs to remember to bring his/her workspace from one site to another when travelling between sites. Of course, in this day of laptops and thumb drives, this is easy enough to do. Another benefit is that each site acts as a backup to all of the other sites, since all sites receive all of the transactions and all of the data. This serves both as a first round defense for backups, and as a disaster recovery capability (i.e. by giving clients access to the other sites). The down-side is that all sites have all of the data - so if you're running a sub-contractor site, they see all of the repository data. The workaround here is to use separate repositories for sub-contractor data, but this is not really satisfactory as queries and operations across the project data become more difficult. This is where the integration of Data Segregation features with a multiple site repository solution comes in handy. One of the difficulties with multiple site repositories occurs when there is a long network outage. The affected (i.e. cut-off) site cannot get it's transactions sent to other sites, or even ordered properly so that they can be applied locally. However, at least one vendor using this multiple site scenario has identified how to continue working in a "partitioned" full database scenario and then to semi-automatically restore synchronization after the outage is over. Advantages
If you take a look at today's CM tools, you'll find that you can't generally say that each tool falls into a specific architectural category. Many tools have capabilities that span the various architectures. You might generally categorize some of the tools. For example, PerForce(R) has primarily a Central Site architecture. ClearCase(R) has primarily a Distributed Data architecture. CM+(R) has primarily a Multiple Site Repository architecture. However, you can certainly use any remote access tool to access data in both ClearCase and CM+. So they have a Central Site/Remote Client flavour. And that goes for any tool that has a Web-based user interface that can be used for CM. But you don't automatically pick up all of the features if the remote access interface is not tuned to the tool. For example, if you have remote access to ClearCase through "VNC", you can't take advantage of the virtual file system capabilities that let your machine access files directly from the repository. As well, some tools have multiple global CM solutions. The most common division point is having one method for source code management, and another for data management. The data management solution might be a feature of the underlying repository. Or it might be a web-interface which acts as a window to the repository. So some data might fall into one architectural category and other data into another. The Next Generation Global CM Solution So which of these architectures do I like best? And what is the solution that will take us into the next generation? Looking at the advantages and disadvantages, I can clearly say that I like a solution that incorporates some of each of the architectures. I want the benefits of each of the architectures and don't want the disadvantages. So consider this as a Next Generation Global CM solution: A Multiple Site Repository architecture, but one which allows remote clients. This adds in most of the benefits from the Central Site architecture. Now add in some data segregation capability and the ability to partition the multiple site network into two or more networks (in case of extended outages) and then to semi-automatically re-synchronize when the outage is solved. This gives a solution that has additional overhead in the case of a long network outage, and in the case that you want to take advantage of data segregation. As well, data segregation is going to eliminate some disaster recovery capability. However, this can be countered by adding additional sites whose purpose is only to receive transactions for disaster recovery and backup purposes. Consider a three-site operation, where sites A and B have the full set of data and site C, a contractor site, has a portion of it. Introducing a site D, which is identical to C, will restore full redundancy for the contractor site. Similarly, if B and C were both contractor sites, you'd need to introduce A', B' and C' sites as designated redundancy sites for each of the operational sites, each which could be different. Or perhaps your tools would allow your full site to be readily converted to a contractor (partial) site without having to keep separate contractor redundany. One more thing I'd like is for the solution to be able to run on different architectures across the network. This isn't a big thing as it's not that expensive to buy the same servers for each site. But it does make a big difference if you're upgrading a server and can switch to another box temporarily, or if you're changing your IT strategy from Unix to Windows to Linux or vice versa. A cross-architecture solution allows you to focus on one site at a time and gives you added flexibility in your timing and order of conversion. The only real drawback of building upon a multiple site repository solution is the added licensing costs for each additional site. However, if the licensing is just a server cost, and not also a multiple-site capability cost, then the added cost might be better viewed as a cost that would be incurred anyway for multiple site operation (though the Central Site solution does not incur this cost directly). You should expect to see multiple-site capability as a commodity feature in NG CM tools. Another factor is that with some tools, you need multiple servers even for a single site solution - so the cost is incurred then, not by the multiple site requirement (unless there is a separate cost imposed), but simply as part of the solution scalability. With a single toolset integrated solution, the server cost factor would likely be lower (single server replication) than if multiple solutions make up your ALM function. I like to look at a Global CM solution as a single site solution that runs over multiple sites. It looks and acts like one site, but is actually a full multiple site solution and (almost) nobody has to ever know the difference. I can add or remove sites - no impact on users. I can move users from one site to another - no impact on them. My Preference I've used the three architecture specific models from time to time over the past 30 years. The partitioned solutions (Distributed Data) were more reasonable 20 to 30 years ago because networks were slow and data storage was expensive. They were also a bit more archaic back then, though they were very well automated when problems areas were avoided. I like Central Site architectures too. But there's no reason that these benefits can't simply be added on top of the others. Remote clients are a fact of life and so we have this capability today, regardless of what other solutions there are. So I expect this capability, but I simply don't see it as adequate on its own to solve a Global CM problem, at least not in site-clustered development. For really distributed development, such as Open Source development, it's a good solution. Today, I prefer the Multiple Site Repository architecture. But it has to be technically sound and have good performance. Fortunately it is and does (i.e. the one I use). More than once I've experienced (server) disk crashes without such an architecture, and more than once I've experienced such crashes with such an architecture. Once you've identified that you have a problem, you just switch your client to another server and continue working - without noticing any differences. It would be even better if the cut-over happened without my even knowing (or perhaps with just an information note). The rapid disaster recovery is what sold me - the multiple site capability was a side benefit. I've also worked with clients using this architecture globally. The hands-off operational aspect has been a very key factor for them. The feedback is that it operates as a single site solution except that upgrades have to be done at multiple sites, though this is normally administered from a single site. Whatever your preference, do not commit to a global CM solution without first trying it. If it's too difficult to set up, or if you can't easily run a few iterations and get a good feel for the ongoing maintenance, you're taking a big gamble. It's not hard to simulate your required bandwidth in an evaluation scenario. And the scenario should be on your hardware with your networks. You want to see the impact on your network, especially if you have limited bandwidth between sites. Ask your vendor what can go wrong and simulate that if possible. Simulate network outages and recovery. I've seen problems with some OS platforms because the CPU resources were far in excess of network bandwidth - it couldn't handle the blocking/buffering. And Beyond The Next Generation Global CM solutions will also provide unexpected benefits as the solution scales down to smaller organizations (assuming near-zero administration). A few examples:
Joe Farah is the President and CEO of Neuma Technology and is a regular contributor to the CM Journal. Prior to co-founding Neuma in 1990 and directing the development of CM+, Joe was Director of Software Architecture and Technology at Mitel, and in the 1970s a Development Manager at Nortel (Bell-Northern Research) where he developed the Program Library System (PLS) still heavily in use by Nortel's largest projects. A software developer since the late 1960s, Joe holds a B.A.Sc. degree in Engineering Science from the University of Toronto. You can contact Joe at farah@neuma.com
Set as favorite
Bookmark
Email this
Hits: 6404 Trackback(0)Comments (0)
|
| Last Updated on Monday, 14 April 2008 04:54 |


There's no doubt about it. The easiest way to do software development
is at a single site. Technology has not changed that. However,
technology has made it a lot easier to do distributed development, even
when the sites are distributed half way around the world. Net
meetings, mirror sites, Mega-bit global web access, collaborative
software for reviewing documents and code - these all make it a lot
easier. What about the CM part of it? Why is it not quite as easy to
do globally? Well, when you get outside of the developer sandboxes,
there are a number of factors which complicate life. So we need to
look at what architecture is needed to do the job right.

