Github was launched in 2008. If your software engineering career, like mine, is no older than Github, then Git may be the only version control software you have ever used. While people sometimes grouse about its steep learning curve or unintuitive interface, Git has become everyone's go-to for version control. In Stack Overflow's 2015 developer survey, 69.3% of respondents used Git, almost twice as many as used the second-most-popular version control system, Subversion.1 After 2015, Stack Overflow stopped asking developers about the version control systems they use, perhaps because Git had become so popular that the question was uninteresting.
Git itself is not much older than Github. Linus Torvalds released the first version of Git in 2005. Though today younger developers might have a hard time conceiving of a world where the term "version control software" didn't more or less just mean Git, such a world existed not so long ago. There were lots of alternatives to choose from. Open source developers preferred Subversion, enterprises and video game companies used Perforce (some still do), while the Linux kernel project famously relied on a version control system called BitKeeper.
Some of these systems, particularly BitKeeper, might feel familiar to a young Git user transported back in time. Most would not. BitKeeper aside, the version control systems that came before Git worked according to a fundamentally different paradigm. In a taxonomy offered by Eric Sink, author of Version Control by Example, Git is a third-generation version control system, while most of Git's predecessors, the systems popular in the 1990s and early 2000s, are second-generation version control systems.2 Where third-generation version control systems are distributed, second-generation version control systems are centralized. You have almost certainly heard Git described as a "distributed" version control system before. I never quite understood the distributed/centralized distinction, at least not until I installed and experimented with a centralized second-generation version control system myself.
The system I installed was CVS. CVS, short for Concurrent Versions System, was the very first second-generation version control system. It was also the most popular version control system for about a decade until it was replaced in 2000 by Subversion. Even then, Subversion was supposed to be "CVS but better," which only underscores how dominant CVS had become throughout the 1990s.
CVS was first developed in 1986 by a Dutch computer scientist named Dick Grune, who was looking for a way to collaborate with his students on a compiler project.3 CVS was initially little more than a collection of shell scripts wrapping RCS (Revision Control System), a first-generation version control system that Grune wanted to improve. RCS works according to a pessimistic locking model, meaning that no two programmers can work on a single file at once. In order to edit a file, you have to first ask RCS for an exclusive lock on the file, which you keep until you are finished editing. If someone else is already editing a file you need to edit, you have to wait. CVS improved on RCS and ushered in the second generation of version control systems by trading the pessimistic locking model for an optimistic one. Programmers could now edit the same file at the same time, merging their edits and resolving any conflicts later. (Brian Berliner, an engineer who later took over the CVS project, wrote a very readable paper about CVS' innovations in 1990.)
In that sense, CVS wasn't all that different from Git, which also works according to an optimistic model. But that's where the similarities end. In fact, when Linus Torvalds was developing Git, one of his guiding principles was WWCVSND, or "What Would CVS Not Do." Whenever he was in doubt about a decision, he strove to choose the option that had not been chosen in the design of CVS.4 So even though CVS predates Git by over a decade, it influenced Git as a kind of negative template.
I've really enjoyed playing around with CVS. I think there's no better way to understand why Git's distributed nature is such an improvement on what came before. So I invite you to come along with me on an exciting journey and spend the next ten minutes of your life learning about a piece of software nobody has used in the last decade. (See correction.)
Instructions for installing CVS can be found on the project's homepage. On MacOS, you can install CVS using Homebrew.
Since CVS is centralized, it distinguishes between the client-side universe and the server-side universe in a way that something like Git does not. The distinction is not so pronounced that there are different executables. But in order to start using CVS, even on your own machine, you'll have to set up the CVS backend.
The CVS backend, the central store for all your code, is called the repository. Whereas in Git you would typically have a repository for every project, in CVS the repository holds all of your projects. There is one central repository for everything, though there are ways to work with only a project at a time.
To create a local repository, you run the init
command. You would do this
somewhere global like your home directory.
$ cvs -d ~/sandbox init
CVS allows you to pass options to either the cvs
command itself or to the
init
subcommand. Options that appear after the cvs
command are global in
nature, while options that appear after the subcommand are specific to the
subcommand. In this case, the -d
flag is global. Here it happens to tell CVS
where we want to create our repository, but in general the -d
flag points to
the location of the repository we want to use for any given action. It can be
tedious to supply the -d
flag all the time, so the CVSROOT
environment
variable can be set instead.
Since we're working locally, we've just passed a path for our -d
argument,
but we could also have included a hostname.
The command creates a directory called sandbox
in your home directory. If you
list the contents of sandbox
, you'll find that it contains another directory
called CVSROOT
. This directory, not to be confused with the environment
variable, holds administrative files for the repository.
Congratulations! You've just created your first CVS repository.
Let's say that you've decided to keep a list of your favorite colors. You are
an artistically inclined but extremely forgetful person. You type up your list
of colors and save it as a file called favorites.txt
:
blue
orange
green
definitely not yellow
Let's also assume that you've saved your file in a new directory called
colors
. Now you'd like to put your favorite color list under version control,
because fifty years from now it will be interesting to look back and see how
your tastes changed through time.
In order to do that, you will have to import your directory as a new CVS
project. You can do that using the import
command:
$ cvs -d ~/sandbox import -m "" colors colors initial
N colors/favorites.txt
No conflicts created by this import
Here we are specifying the location of our repository with the -d
flag
again. The remaining arguments are passed to the import
subcommand. We have
to provide a message, but here we don't really need one, so we've left it
blank. The next argument, colors
, specifies the name of our new directory in
the repository; here we've just used the same name as the directory we are in.
The last two arguments specify the vendor tag and the release tag respectively.
We'll talk more about tags in a minute.
You've just pulled your "colors" project into the CVS repository. There are a
couple different ways to go about bringing code into CVS, but this is the
method recommended by Pragmatic Version Control Using
CVS, the Pragmatic
Programmer book about CVS. What makes this method a little awkward is that you
then have to check out your work fresh, even though you've already got an
existing colors
directory. Instead of using that directory, you're going to
delete it and then check out the version that CVS already knows about:
$ cvs -d ~/sandbox co colors
cvs checkout: Updating colors
U colors/favorites.txt
This will create a new directory, also called colors
. In this directory you
will find your original favorites.txt
file along with a directory called
CVS
. The CVS
directory is basically CVS' equivalent of the .git
directory
in every Git repository.
Get ready for a trip.
Just like Git, CVS has a status
subcommand:
$ cvs status
cvs status: Examining .
===================================================================
File: favorites.txt Status: Up-to-date
Working revision: 1.1.1.1 2018-07-06 19:27:54 -0400
Repository revision: 1.1.1.1 /Users/sinclairtarget/sandbox/colors/favorites.txt,v
Commit Identifier: fD7GYxt035GNg8JA
Sticky Tag: (none)
Sticky Date: (none)
Sticky Options: (none)
This is where things start to look alien. CVS doesn't have commit objects. In the above, there is something called a "Commit Identifier," but this might be only a relatively recent edition—no mention of a "Commit Identifier" appears in Pragmatic Version Control Using CVS, which was published in 2003. (The last update to CVS was released in 2008.5)
Whereas with Git you'd talk about the version of a file associated with commit
45de392
, in CVS files are versioned separately. The first version of your
file is version 1.1, the next version is 1.2, and so on. When branches are
involved, extra numbers are appended, so you might end up with something like
the 1.1.1.1
above, which appears to be the default in our case even though we
haven't created any branches.
If you were to run cvs log
(equivalent to git log
) in a project with lots
of files and commits, you'd see an individual history for each file. You might
have a file at version 1.2 and a file at version 1.14 in the same project.
Let's go ahead and make a change to version 1.1 of our favorites.txt
file:
blue
orange
green
+cyan
definitely not yellow
Once we've made the change, we can run cvs diff
to see what CVS thinks we've
done:
$ cvs diff
cvs diff: Diffing .
Index: favorites.txt
===================================================================
RCS file: /Users/sinclairtarget/sandbox/colors/favorites.txt,v
retrieving revision 1.1.1.1
diff -r1.1.1.1 favorites.txt
3a4
> cyan
CVS recognizes that we added a new line containing the color "cyan" to the
file. (Actually, it says we've made changes to the "RCS" file; you can see that
CVS never fully escaped its original association with RCS.) The diff we are
being shown is the diff between the copy of favorites.txt
in our working
directory and the 1.1.1.1 version stored in the repository.
In order to update the version stored in the repository, we have to commit the change. In Git, this would be a multi-step process. We'd have to stage the change so that it appears in our index. Then we'd commit the change. Finally, to make the change visible to anyone else, we'd have to push the commit up to the origin repository.
In CVS, all of these things happen when you run cvs commit
. CVS just
bundles up all the changes it can find and puts them in the repository:
$ cvs commit -m "Add cyan to favorites."
cvs commit: Examining .
/Users/sinclairtarget/sandbox/colors/favorites.txt,v <-- favorites.txt
new revision: 1.2; previous revision: 1.1
I'm so used to Git that this strikes me as terrifying. Without an opportunity
to stage changes, any old thing that you've touched in your working directory
might end up as part of the public repository. Did you passive-aggressively
rewrite a coworker's poorly implemented function out of cathartic necessity,
never intending for him to know? Too bad, he now thinks you're a dick. You also
can't edit your commits before pushing them, since a commit is a push. Do you
enjoy spending 40 minutes repeatedly running git rebase -i
until your local
commit history flows like the derivation of a mathematical proof? Sorry, you
can't do that here, and everyone is going to find out that you don't actually
write your tests first.
But I also now understand why so many people find Git needlessly complicated.
If cvs commit
is what you were used to, then I'm sure staging and pushing
changes would strike you as a pointless chore.
When people talk about Git being a "distributed" system, this is primarily the difference they mean. In CVS, you can't make commits locally. A commit is a submission of code to the central repository, so it's not something you can do without a connection. All you've got locally is your working directory. In Git, you have a full-fledged local repository, so you can make commits all day long even while disconnected. And you can edit those commits, revert, branch, and cherry pick as much as you want, without anybody else having to know.
Since commits were a bigger deal, CVS users often made them infrequently. Commits would contain as many changes as today we might expect to see in a ten-commit pull request. This was especially true if commits triggered a CI build and an automated test suite.
If we now run cvs status
, we can see that we have a new version of our file:
$ cvs status
cvs status: Examining .
===================================================================
File: favorites.txt Status: Up-to-date
Working revision: 1.2 2018-07-06 21:18:59 -0400
Repository revision: 1.2 /Users/sinclairtarget/sandbox/colors/favorites.txt,v
Commit Identifier: pQx5ooyNk90wW8JA
Sticky Tag: (none)
Sticky Date: (none)
Sticky Options: (none)
As mentioned above, in CVS you can edit a file that someone else is already editing. That was CVS' big improvement on RCS. What happens when you need to bring your changes back together?
Let's say that you have invited some friends to add their favorite colors to your list. While they are adding their colors, you decide that you no longer like the color green and remove it from the list.
When you go to commit your changes, you might discover that CVS notices a problem:
$ cvs commit -m "Remove green"
cvs commit: Examining .
cvs commit: Up-to-date check failed for `favorites.txt'
cvs [commit aborted]: correct above errors first!
It looks like your friends committed their changes first. So your version of
favorites.txt
is not up-to-date with the version in the repository. If you
run cvs status
, you'll see that your local copy of favorites.txt
is version
1.2 with some local changes, but the repository version is 1.3:
$ cvs status
cvs status: Examining .
===================================================================
File: favorites.txt Status: Needs Merge
Working revision: 1.2 2018-07-07 10:42:43 -0400
Repository revision: 1.3 /Users/sinclairtarget/sandbox/colors/favorites.txt,v
Commit Identifier: 2oZ6n0G13bDaldJA
Sticky Tag: (none)
Sticky Date: (none)
Sticky Options: (none)
You can run cvs diff
to see exactly what the differences between 1.2 and
1.3 are:
$ cvs diff -r HEAD favorites.txt
Index: favorites.txt
===================================================================
RCS file: /Users/sinclairtarget/sandbox/colors/favorites.txt,v
retrieving revision 1.3
diff -r1.3 favorites.txt
3d2
< green
7,10d5
<
< pink
< hot pink
< bubblegum pink
It seems that our friends really like pink. In any case, they've edited a
different part of the file than we have, so the changes are easy to merge. CVS
can do that for us when we run cvs update
, which is similar to git pull
:
$ cvs update
cvs update: Updating .
RCS file: /Users/sinclairtarget/sandbox/colors/favorites.txt,v
retrieving revision 1.2
retrieving revision 1.3
Merging differences between 1.2 and 1.3 into favorites.txt
M favorites.txt
If you now take a look at favorites.txt
, you'll find that it has been
modified to include the changes that your friends made to the file. Your
changes are still there too. Now you are free to commit the file:
$ cvs commit
cvs commit: Examining .
/Users/sinclairtarget/sandbox/colors/favorites.txt,v <-- favorites.txt
new revision: 1.4; previous revision: 1.3
The end result is what you'd get in Git by running git pull --rebase
. Your
changes have been added on top of your friends' changes. There is no "merge
commit."
Sometimes, changes to the same file might be incompatible. If your friends had
changed "green" to "olive," for example, that would have conflicted with your
change removing "green" altogether. In the early days of CVS, this was exactly
the kind of case that caused people to worry that CVS wasn't safe; RCS'
pessimistic locking ensured that such a case could never arise. But CVS
guarantees safety by making sure that nobody's changes get overwritten
automatically. You have to tell CVS which change you want to keep going
forward, so when you run cvs update
, CVS marks up the file with both changes
in the same way that Git does when Git detects a merge conflict. You then have
to manually edit the file and pick the change you want to keep.
The interesting thing to note here is that merge conflicts have to be fixed before you can commit. This is another consequence of CVS' centralized nature. In Git, you don't have to worry about resolving merges until you push the commits you've got locally.
Since CVS doesn't have easily addressable commit objects, the only way to group a collection of changes is to mark a particular working directory state with a tag.
Creating a tag is easy:
$ cvs tag VERSION_1_0
cvs tag: Tagging .
T favorites.txt
You'll later be able to return files to this state by running cvs update
and
passing the tag to the -r
flag:
$ cvs update -r VERSION_1_0
cvs update: Updating .
U favorites.txt
Because you need a tag to rewind to an earlier working directory state, CVS
encourages a lot of preemptive tagging. Before major refactors, for example,
you might create a BEFORE_REFACTOR_01
tag that you could later use if the
refactor went wrong. People also used tags if they wanted to generate
project-wide diffs. Basically, all the things we routinely do today with commit
hashes have to be anticipated and planned for with CVS, since you needed to
have the tags available already.
Branches can be created in CVS, sort of. Branches are just a special kind of tag:
$ cvs rtag -b TRY_EXPERIMENTAL_THING colors
cvs rtag: Tagging colors
That only creates the branch (in full view of everyone, by the way), so you
still need to switch to it using cvs update
:
$ cvs update -r TRY_EXPERIMENTAL_THING
The above commands switch onto the new branch in your current working directory, but Pragmatic Version Control Using CVS actually advises that you create a new directory to hold your new branch. Presumably its authors found switching directories easier than switching branches in CVS.
Pragmatic Version Control Using CVS also advises against creating branches
off of an existing branch. They recommend only creating branches off of the
mainline branch, which in Git is known as master
. In general, branching was
considered an "advanced" CVS skill. In Git, you might start a new branch for
almost any trivial reason, but in CVS branching was typically used only when
really necessary, such as for releases.
A branch could later be merged back into the mainline using cvs update
and
the -j
flag:
$ cvs update -j TRY_EXPERIMENTAL_THING
In 2007, Linus Torvalds gave a talk about Git at Google. Git was very new then, so the talk was basically an attempt to persuade a roomful of skeptical programmers that they should use Git, even though Git was so different from anything then available. If you haven't already seen the talk, I highly encourage you to watch it. Linus is an entertaining speaker, even if he never fails to be his brash self. He does an excellent job of explaining why the distributed model of version control is better than the centralized one. A lot of his criticism is reserved for CVS in particular.
Git is a complex tool. Learning it can be a frustrating experience. But I'm also continually amazed at the things that Git can do. In comparison, CVS is simple and straightforward, though often unable to do many of the operations we now take for granted. Going back and using CVS for a while is an excellent way to find yourself with a new appreciation for Git's power and flexibility. It illustrates well why understanding the history of software development can be so beneficial—picking up and re-examining obsolete tools will teach you volumes about the why behind the tools we use today.
I've been told that there are many organizations, particularly risk-adverse
organizations that do things like make medical device software, that still use
CVS. Programmers in these organizations have developed little tricks for
working around CVS' limitations, such as making a new branch for almost every
change to avoid committing directly to HEAD
. (Thanks to Michael Kohne for
pointing this out.)
originally posted at two bit history under CC BY-SA 4.0 by Sinclair Target
"2015 Developer Survey," Stack Overflow, accessed July 7, 2018, https://insights.stackoverflow.com/survey/2015#tech-sourcecontrol.
↩Eric Sink, "A History of Version Control," Version Control By Example, 2011, accessed July 7, 2018, https://ericsink.com/vcbe/html/history_of_version_control.html.
↩Dick Grune, "Concurrent Versions System CVS," dickgrune.com, accessed July 7, 2018, https://dickgrune.com/Programs/CVS.orig/#History.
↩"Tech Talk: Linus Torvalds on Git," YouTube, May 14, 2007, accessed July 7, 2018, https://www.youtube.com/watch?v=4XpnKHJAok8.
↩"Concurrent Versions System - News," Savannah, accessed July 7, 2018, http://savannah.nongnu.org/news/?group=cvs.
↩