Projects - Why Git
This page serves as an introduction to the underlying concepts of the git version control system.
First, I'll let Randal Schwartz do some talking for me.
Next, something for the Computer Scientists among us: <http://eagain.net/articles/git-for-computer-scientists/>.
Every single commit made to a project is represented by its SHA-1 hash
in the git object store. This object store resides in the
.git/objects directory.
Multiple git branches (also called "heads") can share the same working
tree, and also the same object store. Since branches are just
"symlinks" to a blob object, they are very light. The default branch is
called master.
If you look at .git/refs/heads/master in any git checkout, you can see
what I mean by "symlink". This file contains only a SHA-1 ID, which
git resolves to .git/objects/ID, with ID broken into two directories
and a filename.
Having branches defined this way makes it very easy to view the
history (including all branches) of an entire project using the gitweb
interface: all I have to do is propagate the .git directory to the
gitweb server. See, for example,
<http://git.savannah.gnu.org/gitweb/?p=erc.git>, which has both an
emacs23 branch and a master branch.
To switch to an existing branch named BRANCH-NAME, just do:
git checkout BRANCH-NAME
This makes your working tree correspond to the BRANCH-NAME branch and
causes future commits to be attached to that branch.
Remote branches are especially useful. Instead of placing upstream
source into a separate directory, you can store its objects in the same
directory as the rest of your stuff. Remote branches are like normal
branches, except their symlinks are placed in .git/refs/remotes.
Each remote location has its own name. The place from where you
initially clone a repository is called origin. Locations can be added
by either editing .git/config manually or using the git remote
command.
Tags are often used to describe versioned releases of code, like v5.2.
They make it easy to find one particular snapshot of the code. Tag
objects consist of the author, creation date, the object corresponding
to the current head, and (optionally) a GPG signature.
Tag objects are placed in the same object store as commits. Much like
the concept of a branch, git consults .git/refs/tags/ to determine
which tags are currently available. The contents of that directory
are just "symlinks" to tag objects.
Git uses a ridiculously small amount of disk space, due to the efficient way it "packs" objects before storing them.
Git strives for most commands to under 5 seconds to execute, even in a large project with a lot of history like the Linux kernel.
Unlike most other version control systems, git has the concept of an
"index", which serves as a cache between the last commit and the
working directory. It contains all of the information used when
making a new commit. You can either ignore it completely by giving
the git commit command the -a option, or make use of it. Using
the -a option commits changes to all of the files in the working
directory that git knows about. Git knows about files which have been
added via git add.
To add a file to the index, use
git add NAME-OF-FILE
Note that if you make changes to the file after adding it, those changes
won't automatically make it to the index unless you use -a when you
commit.
This lets you be very specific about the files to include in this commit. Much like darcs, you can choose to interactive specify which changes should be committed by doing:
git add --interactive
Or do this and the commit all in one step with:
git commit --interactive