These are (draft) general guidelines for Biopython development using git. For now we are still using CVS.
This document is meant as an outline of the way Biopython is developed. It should include all essential technical information as well as typical procedures and usage scenarios. It should be helpful for core developers, potential code contributors, testers and everybody interested in Biopython code.
This version is an unofficial draft and is subject to change.
If you have found a problem with Biopython, and think you know how to fix it, then we suggest following the simple route of filing a bug and describe you fix. Ideally, you would upload a patch file showing the differences between the latest version of Biopython (from our repository) and your modified version. Working with the command line tools diff and patch is a very useful skill to have, and is almost a precursor to working with a version control system.
You shouldn't go to the trouble of creating your own git fork unless you are intending to make more than a simple one off contribution.
This section describes technical introduction into git usage including required software and integration with Github. If you want to start contributing to Biopython, you definitely need to install git and learn how to obtain a branch of Biopython. If you want to share your changes easily with others, you should also sign up for a Github account and read the corresponding section of the manual. Finally, if you are engaged in one of the collaborations on experimental Biopython modules, you should look also into code review and branch merging.
You will need to install Git on your computer. Git (http://git-scm.com/) is available for all major operating systems. Please use the appropriate installation method as described below.
Git is now packaged in all major Linux distributions, you should find it in your package manager.
You can install Git from the git-core package. e.g.,
sudo apt-get install git-core
You'll probably also want to install the following packages: gitk git-gui git-doc
git is also packaged in rpm-based linux distributions.
yum install gitk
should do the trick for you in any recent fedora/mandriva or derivatives
Mac OS X
Download the .dmg disk image from http://code.google.com/p/git-osx-installer/
msysGit is a port of Git that runs natively on Windows via the MinGW library. Because Git was not originally designed to run on Windows, and since this is a port, some bugs exist, though are rarely encountered in everyday use of Git. See their website for download and installation instructions. Additionally, you can watch Scott Chacon's screencast on installing msysGit on Windows, and see this GitHub guide.
Cygwin provides a Linux-like environment for Windows. It includes access to repositories of many software packages available commonly in Linux distributions, including Git. You can find the git package under the "devel" category.
Testing your git installation
If your installation succeeded, you should be able to run
git --help in a console window to obtain information on git usage. If this fails, you should refer to git documentation for troubleshooting.
Creating a GitHub account (Optional)
Once you have Git installed on your machine, you can obtain the code and start developing. Since the code is hosted at GitHub, however, you may wish to take advantage of the site's offered features by signing up for a GitHub account. While a GitHub account is completely optional and not required for obtaining the Biopython code or participating in development, a GitHub account will enable all other Biopython developers to track (and review) your changes to the code base, and will help you track other developers' contributions. This fosters a social, collaborative environment for the Biopython community.
If you don't already have a GitHub account, you can create one here (the free plan is absolutely enough). Once you have created your account, upload an SSH public key by clicking on 'Account' after logging in. For more information on generating and uploading an SSH public key, see this GitHub guide.
Working with the source code
In order to start working with the Biopython source code, you need to obtain a local clone of our git repository. In git, this means you will in fact obtain a complete clone of our git repository along with the full version history. Thanks to compression, this is not much bigger than a single copy of the tree, but you need to accept a small overhead in terms of disk space.
There are, roughly speaking, two ways of getting the source code tree onto your machine: by simply "cloning" the repository, or by "forking" the repository on GitHub. They're not that different, in fact both will result in a directory on your machine containing a full copy of the repository. However, if you have a GitHub account, you can make your repository a public branch of the project. If you do so, other people will be able to easily review your code, make their own branches from it or merge it back to the trunk.
Using branches on Github is the preferred way to work on new features for Biopython, so it's useful to learn it and use it even if you think your changes are not for immediate inclusion into the main trunk of Biopython. But even if you decide not to use github, you can always change this later (using the .git/config file in your branch.) For simplicity, we describe these two possibilities separately.
Cloning Biopython directly
Getting a copy of the repository (called "cloning" in Git terminology) without GitHub account is very simple:
git clone git://github.com/biopython/biopython.git
This command creates a local copy of the entire Biopython repository on your machine (your own personal copy of the official repository with its complete history). You can now make local changes and commit them to this local copy (although we advise you to use named branches for this, and keep the master branch in sync with the official Biopython code).
If you want other people to see your changes, however, you must publish your repository to a public server yourself (e.g. on GitHub).
Forking Biopython with your GitHub account
If you are logged in to GitHub, you can go to the Biopython repository page:
and click on a button named 'Fork'. This will create a fork (basically a copy) of the official Biopython repository, publicly viewable on GitHub, but listed under your personal account. It should be visible under a URL that looks like this:
Since your new Biopython repository is publicly visible, it's considered good practice to change the description and homepage fields to something meaningful (i.e. different from the ones copied from the official repository).
If you haven't done so already, setup an SSH key and upload it to github for authentication.
Now, assuming that you have git installed on your computer, execute the following commands locally on your machine. This "url" is given on the github page for your repository (if you are logged in):
git clone email@example.com:yourusername/biopython.git
Where <yourusername>, not surprisingly, stands for your GitHub username. You have just created a local copy of the Biopython repository on your machine.
You may want to also link your branch with the official distribution (see below on how to keep your copy in sync):
git remote add upstream git://github.com/biopython/biopython.git
To add additional contributors to your repository on GitHub (i.e. people you want to be able to commit to it), select 'edit' and then add them to the 'Repository Collaborators' section. You will need to know their username on GitHub.
If you haven't already done so, tell git your name and the email address you are using on github (so that your commits get matched up to your github account). For example,
git config --global user.name "David Jones" git config --global user.email "firstname.lastname@example.org"
Making changes locally
Now you can make changes to your local repository - you can do this offline, and you can commit your changes as often as you like. In fact, you should commit as often as possible, because smaller commits are much better to manage and document.
First of all, create a new branch to make some changes in, and switch to it:
git branch demo-branch git checkout demo-branch
To check which branch you are on, use:
Let us assume you've made changes to the file Bio/x.py. Try this:
So commit this change you first need to explicitly add this file to your change-set:
git add Bio/x.py
and now you commit:
git commit -m "added feature Y in Bio.x"
Your commits in Git are local, i.e. they affect only your working branch on your computer, and not the whole Biopython tree or even your fork on GitHub. You don't need an internet connection to commit, so you can do it very often.
Pushing changes to Github
If you are using Github, and you are working on a clone of your own branch, you can very easily make your changes available for others.
Once you think your changes are stable and should be reviewed by others, you can push your changes back to the GitHub server:
git push origin demo-branch
This will not work if you have cloned directly from the official Biopython branch, since only the core developers will have write access to the main repository.
Merging upstream changes
We recommend that you don't actually make any changes to the master branch in your local repository (or your fork on github). Instead, use named branches to do any of your own work. The advantage of this approach it is the trivial to pull the upstream master (i.e. the official Biopython branch) to your repository.
Assuming you have issued this command (you only need to do this once):
git remote add upstream git://github.com/biopython/biopython.git
Then all you need to do is:
git checkout master git pull upstream master
Provided you never commit any change to your local master branch, this should always be a simple fast forward merge without any conflicts. You can then deal with merging the upstream changes from your local master branch into your local branches (and you can do that offline).
If you have your repository hosted online (e.g. at github), then push the updated master branch there:
git push origin master
Submitting changes for inclusion in Biopython
If you think you changes are worth including in the main Biopython distribution, then file an (enhancement) bug on Bugzilla, and include a link to your updated branch (i.e. your branch on GitHub, or another public Git server). You could also attach a patch on Bugzilla. If the changes are accepted, one of the Biopython developers will have to check this code into our main repository.
On GitHub itself, you can inform keepers of the main branch of your changes by sending a 'pull request' from the main page of your branch. Once the file has been committed to the main branch, you may want to delete your now redundant bug fix branch on GitHub. Branches can be deleted by selecting 'edit' and then 'delete repository' from the bottom of the edit page.
It is mandatory to merge with the current trunk of Biopython before submitting your changes to avoid excess work on the receiving side.
Since git is a fully distributed version control system, anyone can integrate changes from other people, assuming that they are using branches derived from a common root. This is especially useful for people working on new features who want to accept contributions from other people.
This section is going to be of particular interest for the Biopython core developers, or anyone accepting changes on a branch.
For example, suppose Eric has some interesting changes on his public repository:
You must tell git about this by creating a reference to this remote repository:
$ git remote add eric git://github.com/etal/biopython.git
Now we can fetch all of Eric's public repository with one line:
$ git fetch eric remote: Counting objects: 138, done. remote: Compressing objects: 100% (105/105), done. remote: Total 105 (delta 77), reused 0 (delta 0) Receiving objects: 100% (105/105), 27.53 KiB, done. Resolving deltas: 100% (77/77), completed with 24 local objects. From git://github.com/etal/biopython * [new branch] bug2754 -> eric/bug2754 * [new branch] master -> eric/master * [new branch] pdbtidy -> eric/pdbtidy * [new branch] phyloxml -> eric/phyloxml
Now we can run a diff between any of our own branches and any of Eric's branches. You can list your own branches with:
$ git branch * master ...
Remember the asterisk shows which branch is currently checked out.
To list the remote branches you have setup:
$ git branch -r eric/bug2754 eric/master eric/pdbtidy eric/phyloxml upstream/master origin/HEAD origin/master ...
For example, to show the difference between your master branch and Eric's master branch:
$ git diff master eric/master ...
If you are both keeping your master branch in sync with the upstream Biopython repository, then his master branch won't be very interesting. Instead, try:
$ git diff master eric/pdbtidy ...
You might now want to merge in (some) of Eric's changes to a new branch on your local repository.
If you later want to remove the reference to this particular branch:
$ git branch -r -d eric/pdbtidy Deleted remote branch eric/pdbtidy (79b5974)
Or, to delete the references to all of Eric's branches:
$ git remote rm eric $ git branch -r upstream/master origin/HEAD origin/master ...
Alternatively, from within GitHub you can use the fork-queue to cherry pick commits from other people's forked branches. See this github blog post for details. While this defaults to applying the changes to your current branch, you would typically do this using a new integration branch, then fetch it to your local machine to test everything, before merging it to your main branch.
There are a lot of different nice guides to using Git on the web:
- Git Community Book
- Understanding Git Conceptually
- git ready: git tips
- http://projects.scipy.org/numpy/wiki/GitWorkflow Numpy is also evaluating git