As some of you might know, in November we did our first session of Zen Chats, a meeting in which some members of the Zentyal Team show something of his or her interest to his or her fellow co-workers. For the first session we had three talks, Nacho spoke us about the company from the strategic point of view, Quique explained in detail the architecture of Zentyal Cloud and finally I tried to get my mates love git a little more.

My talk was titled Top 5 advices for using git and it was organized in five blocks of useful information more or less known about this version control system that some of the Zentyal developers frequently use. Below you can find a summary of the presentation that I wanted to share with you in case you find it useful!

1. Some things about git

Git is a distributed version control system – It means that every node involved with a hosted project can act as a client and as a server. This implies that in git “to clone” is really “to clone“, meaning that when you clone a source repository you are copying everything from its location, including all the history, so, after cloning, a local repository contains all the needed information to be a server. This distributed model also allows to have different origins for different branches and also to include other git projects as subprojects. Going deeper, a git repository is composed by all the raw data with it’s meta-information and a representation of some state of the files. It’s totally opposed to a centralized repository where you have the data and meta information in the server side and the representation of the files and maybe also some meta information in the client side. Anyway, a repository that is going to act only as a sever doesn’t need to check out the files, they are called “bare” repositories.

Git is built over a very simple object model in which only four classes are defined: commit, tree, blob and tag. This model allows to have simpler information and better optimizations.

Another important point about git is that when committing a version of the files, it stores complete snapshots of the staged files, not only its deltas.

2. Beware of the configuration

Git includes a configuration subsystem that allows to control any kind of variables, from pure user settings as name and e-mail to more specific information as repository origins. This configuration subsystem can be basically used with the following commands:

  • To list all the values, git config [--global] -l
  • To read a variable, git [--global] section.variable
  • To set the value of a variable, git config [--global] section.variable value
  • To remove a variable, git config [--global] --unset section.variable

Notice that the configuration can be global to all the repositories of the user or local to one of them, the global configuration is checked by default if the variable is not set in the local configuration or if it’s indicated with the --global flag.

Something important is to set who is the user doing the changes, this can be done by setting the variables and or using environment variables. With the second option you can also specify if an author is different to the committer. The environment variables used for doing this are GIT_AUTHOR_NAME, GIT_AUTHOR_EMAIL, GIT_COMMITTER_NAME and GIT_COMMITTER_EMAIL.

A useful feature implemented with this configuration system is the aliases, that allow to define custom git commands based on built-in commands or on system commands, e.g, to define an alias to serve the current repository with git serve we can do this:

git config [--global] alias.serve !git daemon --reuseaddr --verbose --base-path=. --export-all ./.git

And don’t forget the extremely useful .gitignore. In this file you can write a list of patterns and git will ignore all the files that match with any of them.

3. History repeats itself (or not…)

One of the things that probably takes your attention when you start working with git is the “ugly” commit identifier. If you’re used to more traditional version control systems, like CVS or Subversion, you’ll be used to simple, sequential numbers that identify each commit in the order that they were committed to the central server. As you can suppose, this mechanism is not valid in a distributed system because the commits are not necessarily committed to the same repository and they cannot be chronologically ordered. So, the chosen solution was to store the commits in Directed Acyclic Graphs (DAG) and identify them by a unique checksum. This mechanism is not only used for commits in git, but it’s also used for every kind of object. It’s also useful to avoid repeated objects, e.g. if you have two identical files (represented as blobs) in the repository, these files will have the same identifier and they won’t be saved twice.

But this so-convenient-identifier-system is not so convenient when humans have to interact with it, and for this reason, some helper features were added to git:

  • Abbreviations: When you have to refer to a commit, you only need to write some of its first characters – git will find the nearest one starting with these characters.
  • Tags: As with other version control systems, with git you can tag a specific commit with a more human identifier, a branch could also be considered a specific way of tagging, but also forking the history.
  • Symbols: There are some symbols and operators that can be used to refer to specific commits or intervals of commits, e.g. HEAD will always refer to the last commit in the current branch, HEAD^ to the previous one, HEAD^^ to the previous of the previous one, and for older commits you can use tilde and concrete numbers as HEAD~5 to refer to the fifth commit before HEAD.

And there are some commands to help the user with these identifiers, with git log you can see a chronologically-inverse ordered list of commits with their identifiers, git show can be used to print the contents of a specific commit or file in the commit. There are also commands to graphically see the contents of the repository, as gitk that is a complete desktop application to navigate along the commits or git instaweb that shows this information through a web server deployed by git itself.

You can also navigate across the repository history with blame, reflog, fsck, ls-tree or blame.

Or change the history of the repository with these commands:

  • git commit --amend, a very useful one, allows you to correct mistakes in the last commit.
  • git reset changes the HEAD to a specific commit and optionally resets the index and the local copy to match this change.
  • git rebase is a powerful command that allows to rewrite the index, i.e. to merge several commits in only one, to edit or completely remove them, or to change their order.

In any case, don’t change the history if you have shared your changes, as you could create serious conflicts with the people that already downloaded them – Think that with great power comes great responsibility, so, although you can force git to do almost everything, think about what you’re doing when playing with history if you are working collaboratively.

4. Let’s Get to Work!

Until now I haven’t talked about how to do any real work that everyone does with any version control system. In git, as with other tools, the most used commands are the ones to do commits, to retrieve the commits done by other people and to see the status of our current work.

First of all, the most commonly used commands to check the current status of your work are git diff and git status. I mention them first because you should use them always before doing any other thing. git diff can be used to see differences between commits, but it’s usually executed without parameters to see the differences between the working copy and HEAD – It’s very useful to know what you are going to commit before doing it. You can also see the current status of your working copy with git status, that shows the list of files with a special status. The most common status are modified, staged or untracked. A modified file, as its name says, is a file that you have modified; before committing your changes you have to stage the files that you have modified, so a staged file is a file marked to be committed, and finally, an untracked file is a file that has never been committed or staged. An important point of git status is that, as other git commands, it is used to provide very useful information about the state of the repository in its output.

To manually stage a file, you can use the command git add, but the files can also be automatically staged when committing. The command git commit is used to finally commit the changes to the local repository: without parameters it will commit all the staged files and with the -a flag it will automatically stage all the modified files before committing. You can also specify the files to stage and commit adding them as parameters. Directories used as parameters of add or commit are recursively staged.

Finally, you need commands to share your changes with other repositories or to retrieve the changes shared by other developers. As mentioned before, you need to clone a repository to work with it and this can be done with the command git clone. Once the repository is cloned, you can use git pull to update your local copy with the changes committed in the remote origin. This command is really doing two things: first it downloads all the commits that you don’t have in your index (with git fetch) and then it merges everything (with git merge). If during the merge git detects that you have modified something that was also modified in the remote repository, it will mark this part to be a conflict and you’ll need to solve it by hand.

If you want to share your changes, you’ll need to use git push, that sends all the commits locally created to a remote repository. It’s important to have the repository updated with git pull and correctly merged before pushing as the remote repository doesn’t have any merging logic, so it will only add the new objects to its index.

A special mention to git stash, a command that allows to push unstaged changes in a temporal stack, so that you can do other tasks as merges that you cannot do with a dirty workspace. When you finish these tasks you can git stash pop to apply again the changes pushed into the stack in your working copy.

5. Branches

I’m not going to deeply describe the branching subsystem of git, I’m simply going to list some of its great advantages:

  • It’s almost free, creating a local branch is almost instantaneous as it only creates a couple of new objects, no file copying, no remote connections.
  • It allows to easily and safely group commits.
  • It allows to maintain in a separate codebase a modification that was growing unexpectedly.
  • It allows to quickly and safely manage monstrous merges.

A whole new world to explore, do it!

Bonus track

  • Workflow cheatsheet: pull, add, diff, status, add, commit, [pull], push
  • Tip: Don’t touch the index without committing, staging or stashing before (nothing red in the status).
  • Tip: Read everything git tells you to read, specially with status and rebase, if you want to avoid disasters.

A post by Jaime Soriano