Distribution
The issue of
distribution often gives rise to heated debate, in the context of
SCM.
Linus Torvalds
uses it as a criterion to ridicule
CVS and
subversion (centralized systems), thus considering the
distributed design of
git as a decisive advantage.
ClearCase, in its original (non-
UCM) design, required a
distributed architecture (
MultiSite), but in its UCM extension, favours
snapshot views, which suit a centralized one.
An SCM system which intends to offer some management of
changes, before they are committed, i.e. to support the developers themselves, soon requires a direct and synchronous access to the repository, and thus, that the architecture is distributed.
Centralized architectures can in fact only pretend to offer some management of change
tokens, after the real work has been accomplished.
This was the rationale of the original
ClearCase.
Replication in
ClearCase is asynchronous, and on a fair basis, all sites being structurally equal.
Now, this unfortunately fell short of supporting
auditing, as
derived objects were and are not (probably rightly so) replicated.
Downloading a file incurs incompressible delays, whereas materializing it locally does not (i.e. it incurs some delays, but not
incompressible ones): only the latter is acceptable in a synchronous context.
It is nevertheless often true that downloading the file may be faster, under given circumstances, or even the only solution (for various possible reasons).
Such a reversal of paradigms has taken place in many 'build' systems like
maven or
Buckminster, with the remote resources being possibly
cached locally.
The missing synthesis for SCM systems based truly on
auditing would be to accept one
additional level of indirection, so as to support as equivalent whichever occurrence would happen to be the fastest.
Configuration records (
bills of material) would indeed need to be replicated, in order to allow the necessary lazy identification.
A note on snapshot views
When laptop computers first arrived on the market, nobody thought of using
MultiSite as the basis of a solution to support them.
They obviously didn't have the processing power, the needed disk space or amount of memory to host a standalone, replicated site.
Also, the design of MultiSite would not easily support such a pattern of use (multiple replicas, with sporadic connectivity).
Thus, snapshot views were brought in, as a necessity.
The shortcomings in terms of performance are now largely an issue of the past.
Now, one could consider getting rid of snapshot views...
--
MarcGirod - 15 Jun 2007