How to use DCVS Snapshots and Change Sets for Distributed Configuration Management

Experienced CVS users may wonder why we think that snapshots and change sets are `a good thing' (tm) for configuration management with CVS and especially for distributed configuration management: there are already tags to label configurations, and change sets are implicitly defined by the difference between two tags. The following paragraphs we will show the considerable benefits of these concepts and how they may be used in a distributed development scenario in an efficient way.

The text should be interesting and understandable for those as well who don't consider themselves to be (D)CVS experts and gurus yet.

What is the particular advantage of Snapshots over Tags?

Tags in CVS are mappings from symbolic names to revision numbers for a set of files. Branch tags are not different in this respect from ordinary tags; they just point to a special kind of revision number. There are two facts we'd like to point out:

  1. tags are not invariant with respect to time (there is no protection mechanism), and
  2. their values are scattered throughout lots of files,

i.e. they have no existence of their own and thus are not invariant with respect to different sets of files.

From the point of view of release engineering and reliable configuration management tags are thus not a really adequate means to implement bound configurations. Bound configurations are the primary concept used to freeze known and working versions of a system, e.g. for releases or as internal baselines. Thus it is a major requirement for the implementation of this concept from the view of configuration management that they are stable over time and do not change. CVS tags can be used for this purpose, but as pointed out above, it is not easy to guarantee that they do not change (there's no concept of locks or access control for tags within CVS), and their meaning is not complete unless they are complemented by a list of all involved files (a manifest).

As writing down manifests and other lists of necessary ingredients and prerequisites is one of the main tasks of configuration management, it's no big step from here to extend these lists by the concrete version of every file involved and store them as independent objects: and that's just what DCVS snapshots are.

Why Change Sets are useful for CM?

Most of the configuration management processes we've encountered so far in the companies and enterprises we've worked with are based upon the model of baselines and change sets. This is not very surprising as this process model is rather obvious and intuitive. Baselines are stable configurations that are used as base for any development, be it targeted at new features and functionality or more stability, robustness, or efficiency (which is what is frequently needed for customer releases).

Change sets in release and configuration management:

New releases do not necessarily follow the branches evolving in the version control repository, but are rather evolutions of existing stable baselines by sets of known and tested changes. Thus baselines and change sets can be seen as higher level concepts of configuration management than tags and branches. They can also be viewed as the interface between the lower levels of the development process and a higher level in which project or product management is interested.

DCVS supports these concepts with its implementation of snapshots and change sets. Snapshots can be used to identify baselines, and change sets are the objects used to represent feature additions and bug fixes. Thus change sets are the kind of objects that need to be referenced in any process control or change management system.

Change sets viewed from the top-down perspective are the work results of change requests and bug reports and should thus be linked in the CM process to these issues.

Change sets in DCVS:

Change sets are the combination of two snapshots, or a snapshot extended by a second, different mapping for every file revision. Like snapshotst , they are stored as independent objects and are more robust, stable, and reliable than two tags for the same reasons as explained in the chapter above.

Why Change Sets are especially useful for Distributed CM

The preceding chapters describe the importance of snapshots and change sets for the general process model of change management based on baselining. Now we'll show that change sets are still more important for distributed development scenarios.

Most distribution models of version control repositories use dedicated development lines (LODs, branches) to avoid conflicts when creating changes in local database replicas. DCVS is not different in this respect from e.g. Rational's ClearCase (tm). For the developers this means that changes can only be committed to locally-mastered branches. A special subset of these locally-mastered branches are usually the locally-created branches. This means that it is always possible to commit a set of changes to a new branch created on-the-fly, but it may be impossible to commit changes to other lines of development (due to administrative or efficiency restrictions).

When teams at different sites are not able to commit to any branch, then it may not be appropriate to use branches as the direct implementation of product- or customer-specific development lines. From the configuration management view it may be much more effective to map development lines to a set of branches in the repository, or better still to a set of change sets. This is exactly the purpose of snapshots and change sets in DCVS: they are the high-level means to model baselines and development lines in a distributed scenario. Every single change may be committed to the repository on its own branch, thereby creating a unique change set, which can then be applied to many other baselines and merged to other development lines. In order to make this kind of CM process easy to implement, snapshots and change sets have been added as independent, uniquely named, and reliable objects to DCVS.