mwolson.org logo Projects - Why Git

First steps
Object store
Branches
Tags
Disk space and speed
The Index

This page serves as an introduction to the underlying concepts of the git version control system.

First steps

First, I'll let Randal Schwartz do some talking for me.

Next, something for the Computer Scientists among us: <http://eagain.net/articles/git-for-computer-scientists/>.

Object store

Every single commit made to a project is represented by its SHA-1 hash in the git object store. This object store resides in the .git/objects directory.

Branches

Multiple git branches (also called "heads") can share the same working tree, and also the same object store. Since branches are just "symlinks" to a blob object, they are very light. The default branch is called master.

If you look at .git/refs/heads/master in any git checkout, you can see what I mean by "symlink". This file contains only a SHA-1 ID, which git resolves to .git/objects/ID, with ID broken into two directories and a filename.

Having branches defined this way makes it very easy to view the history (including all branches) of an entire project using the gitweb interface: all I have to do is propagate the .git directory to the gitweb server. See, for example, <http://git.savannah.gnu.org/gitweb/?p=erc.git>, which has both an emacs23 branch and a master branch.

To switch to an existing branch named BRANCH-NAME, just do:

git checkout BRANCH-NAME

This makes your working tree correspond to the BRANCH-NAME branch and causes future commits to be attached to that branch.

Remote branches are especially useful. Instead of placing upstream source into a separate directory, you can store its objects in the same directory as the rest of your stuff. Remote branches are like normal branches, except their symlinks are placed in .git/refs/remotes.

Each remote location has its own name. The place from where you initially clone a repository is called origin. Locations can be added by either editing .git/config manually or using the git remote command.

Tags

Tags are often used to describe versioned releases of code, like v5.2. They make it easy to find one particular snapshot of the code. Tag objects consist of the author, creation date, the object corresponding to the current head, and (optionally) a GPG signature.

Tag objects are placed in the same object store as commits. Much like the concept of a branch, git consults .git/refs/tags/ to determine which tags are currently available. The contents of that directory are just "symlinks" to tag objects.

Disk space and speed

Git uses a ridiculously small amount of disk space, due to the efficient way it "packs" objects before storing them.

Git strives for most commands to under 5 seconds to execute, even in a large project with a lot of history like the Linux kernel.

The Index

Unlike most other version control systems, git has the concept of an "index", which serves as a cache between the last commit and the working directory. It contains all of the information used when making a new commit. You can either ignore it completely by giving the git commit command the -a option, or make use of it. Using the -a option commits changes to all of the files in the working directory that git knows about. Git knows about files which have been added via git add.

To add a file to the index, use

git add NAME-OF-FILE

Note that if you make changes to the file after adding it, those changes won't automatically make it to the index unless you use -a when you commit.

This lets you be very specific about the files to include in this commit. Much like darcs, you can choose to interactive specify which changes should be committed by doing:

git add --interactive

Or do this and the commit all in one step with:

git commit --interactive

Git Stuff