[This is a guest post from Long Nguyen. —mhartl]
Like a lot of projects in the Ruby on Rails world, the Insoshi social networking platform uses Git and GitHub to manage its open source development and contributions. In setting up the repositories for Insoshi, I’ve applied the version control experience I gained at Discover, where I was technical lead for the software configuration management (SCM) team. Since some aspects of our setup aren’t obvious if you haven’t managed large projects before, we at Insoshi decided to share the details so that other GitHub projects might benefit as well.
We’ll start by reviewing the typical Git workflow based on pull requests, then discuss some problems you might run into with a “typical” repository setup, and finally explain the details of preparing the Insoshi Git repository for collaboration.
Why Pull Requests?
Git was originally developed by Linus Torvalds to host the Linux kernel, and pull requests are the de-facto standard for submitting contributions in Git because that’s what Linus does. (He talked about this in his Google Tech Talk on Git.) The concept of the pull request is straightforward: You notify someone that you’ve made an update via email, messaging on GitHub, etc. and let them know where to find it. They can then pull in your changes and merge it with their work.
Except for that interaction, everyone works within their own repository and on their own schedule. There’s no process waiting to be completed that blocks you from moving on to whatever you need/want to do next. And you’re not forcing anyone to drop what they’re doing to right now to handle your request.
It’s all very polite. And it works well in the context of distributed development since you avoid all kinds of coordination issues.
If you want to contribute to an open source project, here’s really all that you need:
- A publicly accessible repository where your changes can be found
- A local repository for your development
Even if you’re new to Git, these both seem like pretty straightforward things to do—especially if you’re using GitHub for the public repository: your repository is just a fork of the main project repository.
Let’s set up our repository by going to the official Insoshi repository and clicking on the fork button:
I’ll need make note of the public clone URL for the official repository and my private clone URL for my newly created fork:
- Official Insoshi public clone URL
- My fork’s private clone URL
Your local repository: The “obvious” thing to do
At this point, I’ll be tempted to go ahead and make a local clone of my fork:
$ git clone email@example.com:long/insoshi.git
and immediately get to work.
Technically, there’s nothing wrong with that. And as an individual developer starting a new project, it’s what you do, but there are several disadvantages to this seemingly straightforward approach. One of the major benefits of a distributed version control system like Git is that each repository is on an equal footing; in particular, we would like every fork to have the same master branch, so that if the “official” Insoshi repository should ever be lost there would be plenty of redundant backups. We also want it to be easy for each developer to pull in changes from the official repository; the “obvious” approach isn’t set up for that. Finally, it’s a bad idea in general to work on the master branch; experienced Git users typically work on separate development branches and then merge those branches into master when they’re done.
What we’d like is a way to connect up the local repository in a way that will
- Keep the repositories in sync so that each contains the full “official” repository
- Allow developers to pull in official updates
- Encourage working on branches other than master
In the “obvious” configuration, I’m not set up to do any of that:
- There’s no local connection to the official repository for updates
- There’s no mechanism in place to push official updates to my fork on GitHub
- We’re working directly on the master branch
Your local repository: The “right” way
Keeping the big picture in mind, here are the commands I’ve run to set up my local repository (using the GitHub id long):
$ git clone git://github.com/insoshi/insoshi.git $ cd insoshi $ git branch --track edge origin/edge $ git branch long edge $ git checkout long $ git remote add long firstname.lastname@example.org:long/insoshi.git $ git fetch long $ git push long long:refs/heads/long $ git config branch.long.remote long $ git config branch.long.merge refs/heads/long
Let’s take a detailed look at what these steps accomplish.
So what does it all mean?
Create a local clone of the Insoshi repository:
$ git clone git://github.com/insoshi/insoshi.git
You should note that the Git URL for the clone references the official Insoshi repository and not the URL of my own fork (i.e., the clone URL is git://github.com/insoshi/insoshi.git instead of email@example.com:long/insoshi.git). This way, the official repository is the default remote (aka ‘origin’), and the local master branch tracks the official master.
I have to change into the repository to perform additional git setup:
$ cd insoshi
Insoshi also has an ‘edge’ branch for changes that we want to make public but may require a bit more polishing before we’d consider them production-ready (in the past this has included migrating to Rails 2.1 and Sphinx/Ultrasphinx). Our typical development lifecycle looks something like
development -> edge -> master
I want to create a local tracking branch for it:
$ git branch --track edge origin/edge
Steps four and five
As I mentioned before, I’m resisting the temptation to immediately start working on the local ‘master’ and ‘edge’ branches. I want to keep those in sync with the official Insoshi repository.
I’ll keep my changes separate by creating a new branch ‘long’ that’s based off edge and checking it out:
$ git branch long edge $ git checkout long
By the way, you can actually combine the two commands if you like, using just the ‘git checkout’ command with the -b flag:
$ git checkout -b long edge
You can name this branch anything that you want, but I’ve chosen my GitHub id so that it’s easy to identify.
I’m starting my changes off of ‘edge’ since that contains all the latest updates and any contribution I submit a pull request for will be merged first into the official Insoshi ‘edge’ branch to allow for public testing before it’s merged into the ‘master’.
Steps six and seven
I’m finally adding the remote reference to my fork on GitHub:
$ git remote add long firstname.lastname@example.org:long/insoshi.git
I’ve used my GitHub id once again, this time as the remote nickname.
We should run a fetch immediately in order to sync up the local repository with the fork:
$ git fetch long
I’m pushing up my new local branch up to my fork. Since it’ll be a new branch on the remote end, I need to fully specify the remote refspec:
$ git push long long:refs/heads/long
Steps nine and ten
Now that the new branch is up on my fork, I want to set the branch configuration to track it:
$ git config branch.long.remote long $ git config branch.long.merge refs/heads/long
Setting the remote lets me just simply use
$ git push
to push changes on my development branch up to my fork
Setting the merge configuration is mainly for completeness at this point. But if you end up working on more than one machine (work/home, desktop/laptop, etc.), it’ll allow you to just use
$ git pull
to grab the changes you’ve pushed up to your fork.
Isn’t that a lot of extra work to do?
This may seem like a lot work up front, but it’s all configuration work that you’d eventually do anyway. If you’re really that concerned about the extra typing, I’ve got a shell script for you.
The extra work is worth the effort, because with this configuration
- My changes will be easily identifiable in my named branch
- I can easily get updates from the main Insoshi repository
- Any updates I’ve pulled into master and edge are automatically pushed up to my fork on GitHub
The last one is a bonus because the default refspec for remotes is refs/heads/*:refs/heads/*. This means that the simple ‘git push’ command will push up changes for all local branches that have a matching branch on the remote. And if I make it a point to pull in updates to my local master and edge but not work directly on them, my fork will match up with the official repository.
So what is the benefit of all this to open source projects like Insoshi?
- The easier it is for the contributor to pull in updates, the more likely it will be that the pull request will be for code that merges easily with the latest releases (with few conflicts)
- You can tell if someone is pulling updates by looking at their master and edge branches and seeing if they match up with the latest branches on the main repository
- By getting contributors in the habit of working on branches, you’re going to get better organized code contributions
Basically, the less effort that’s required to bring in code via a pull request, the sooner it can be added to the project release. And at the end of the day, that’s really what it’s all about.
Putting (pushing and pulling) it all together
Now that we’ve covered all the details, let’s go through the full set of steps needed to make a contribution to a project like Insoshi:
- Fork the Insoshi repository on GitHub:
- Follow the Git steps above or use the shell script to set up your local repository
- Checkout the local branch, just to be sure:
$ git checkout long
- Make some changes (and remember your development branch is against ‘edge’) and commit them:
[make changes in a text editor] $ git commit -m "My great contribution" $ git push
- Go to your fork and branch at GitHub (I’m at long/insoshi @ long) and click on the pull request button:
- Tell us about what you just did and make sure “insoshi” is a recipient:
- Bask in the glory of being an open-source contributor!