
<?xml version="1.0"?>
<?xml-stylesheet type="text/css" href="http://www.biopython.org/w/skins/common/feed.css?303"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
		<id>http://www.biopython.org/w/api.php?action=feedcontributions&amp;user=Mdehoon&amp;feedformat=atom</id>
		<title>Biopython - User contributions [en]</title>
		<link rel="self" type="application/atom+xml" href="http://www.biopython.org/w/api.php?action=feedcontributions&amp;user=Mdehoon&amp;feedformat=atom"/>
		<link rel="alternate" type="text/html" href="http://www.biopython.org/wiki/Special:Contributions/Mdehoon"/>
		<updated>2013-05-19T00:22:39Z</updated>
		<subtitle>User contributions</subtitle>
		<generator>MediaWiki 1.18.1</generator>

	<entry>
		<id>http://www.biopython.org/wiki/The_Biopython_Structural_Bioinformatics_FAQ</id>
		<title>The Biopython Structural Bioinformatics FAQ</title>
		<link rel="alternate" type="text/html" href="http://www.biopython.org/wiki/The_Biopython_Structural_Bioinformatics_FAQ"/>
				<updated>2013-03-26T09:05:10Z</updated>
		
		<summary type="html">&lt;p&gt;Mdehoon: /* How do I determine secondary structure? */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Introduction ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB is a biopython module that focuses on working with crystal&lt;br /&gt;
structures of biological macromolecules. This document gives a fairly&lt;br /&gt;
complete overview of Bio.PDB.&lt;br /&gt;
&lt;br /&gt;
== Bio.PDB's installation ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB is automatically installed as part of Biopython. Biopython&lt;br /&gt;
can be obtained from http://www.biopython.org. It runs on many&lt;br /&gt;
platforms (Linux/Unix, windows, Mac,...).&lt;br /&gt;
&lt;br /&gt;
== Who's using Bio.PDB? ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB was used in the construction of DISEMBL, a web server that&lt;br /&gt;
predicts disordered regions in proteins (http://dis.embl.de/),&lt;br /&gt;
and COLUMBA, a website that provides annotated protein structures&lt;br /&gt;
(http://www.columba-db.de/). Bio.PDB has also been used to&lt;br /&gt;
perform a large scale search for active sites similarities between&lt;br /&gt;
protein structures in the PDB (see [http://dx.doi.org/10.1002/prot.10338 ''Proteins'' '''51''': 96&amp;amp;ndash;108 (2003)]),&lt;br /&gt;
and to develop a new algorithm that identifies linear secondary structure elements (see [http://www.biomedcentral.com/1471-2105/6/202 ''BMC Bioinformatics'' '''6''': 202 (2005)]).&lt;br /&gt;
&lt;br /&gt;
Judging from requests for features and information, Bio.PDB is also&lt;br /&gt;
used by several LPCs (Large Pharmaceutical Companies :-).&lt;br /&gt;
&lt;br /&gt;
== Is there a Bio.PDB reference? ==&lt;br /&gt;
&lt;br /&gt;
Yes, and I'd appreciate it if you would refer to Bio.PDB in publications&lt;br /&gt;
if you make use of it. The reference is:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;blockquote&amp;gt;&lt;br /&gt;
Hamelryck, T., Manderick, B.: &amp;quot;PDB parser and structure class implemented in Python&amp;quot;. ''Bioinformatics'' '''19''': 2308&amp;amp;ndash;2310 (2003)&lt;br /&gt;
&amp;lt;/blockquote&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The article can be freely downloaded via the [http://www.binf.ku.dk/users/thamelry/references.html Bioinformatics journal website].&lt;br /&gt;
I welcome e-mails telling me what you are using Bio.PDB for. Feature requests are welcome too.&lt;br /&gt;
&lt;br /&gt;
== How well tested is Bio.PDB? ==&lt;br /&gt;
&lt;br /&gt;
Pretty well, actually. Bio.PDB has been extensively tested on nearly&lt;br /&gt;
5500 structures from the PDB - all structures seemed to be parsed&lt;br /&gt;
correctly. More details can be found in the Bio.PDB Bioinformatics&lt;br /&gt;
article. Bio.PDB has been used/is being used in many research projects&lt;br /&gt;
as a reliable tool. In fact, I'm using Bio.PDB almost daily for research&lt;br /&gt;
purposes and continue working on improving it and adding new features.&lt;br /&gt;
&lt;br /&gt;
== How fast is it? ==&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;PDBParser&amp;lt;/code&amp;gt; performance was tested on about 800 structures&lt;br /&gt;
(each belonging to a unique SCOP superfamily). This takes about 20&lt;br /&gt;
minutes, or on average 1.5 seconds per structure. Parsing the structure&lt;br /&gt;
of the large ribosomal subunit (1FKK), which contains about 64000&lt;br /&gt;
atoms, takes 10 seconds on a 1000 MHz PC. In short: it's more than&lt;br /&gt;
fast enough for many applications.&lt;br /&gt;
&lt;br /&gt;
== Why should I use Bio.PDB? ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB might be exactly what you want, and then again it might not.&lt;br /&gt;
If you are interested in data mining the PDB header, you might want&lt;br /&gt;
to look elsewhere because there is only limited support for this.&lt;br /&gt;
If you look for a powerful, complete data structure to access the&lt;br /&gt;
atomic data Bio.PDB is probably for you. &lt;br /&gt;
&lt;br /&gt;
== Usage ==&lt;br /&gt;
&lt;br /&gt;
=== General questions ===&lt;br /&gt;
&lt;br /&gt;
==== Importing Bio.PDB ====&lt;br /&gt;
&lt;br /&gt;
That's simple:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
from Bio.PDB import *&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Is there support for molecular graphics? ====&lt;br /&gt;
&lt;br /&gt;
Not directly, mostly since there are quite a few Python based/Python&lt;br /&gt;
aware solutions already, that can potentially be used with Bio.PDB.&lt;br /&gt;
My choice is Pymol, BTW (I've used this successfully with Bio.PDB,&lt;br /&gt;
and there will probably be specific PyMol modules in Bio.PDB soon/some&lt;br /&gt;
day). Python based/aware molecular graphics solutions include:&lt;br /&gt;
&lt;br /&gt;
* PyMol: http://pymol.sourceforge.net/&lt;br /&gt;
* Chimera: http://www.cgl.ucsf.edu/chimera/&lt;br /&gt;
* PMV: http://www.scripps.edu/~sanner/python/&lt;br /&gt;
* Coot: http://www.ysbl.york.ac.uk/~emsley/coot/&lt;br /&gt;
* CCP4mg: http://www.ysbl.york.ac.uk/~lizp/molgraphics.html&lt;br /&gt;
* mmLib: http://pymmlib.sourceforge.net/ &lt;br /&gt;
* VMD: http://www.ks.uiuc.edu/Research/vmd/&lt;br /&gt;
* MMTK: http://starship.python.net/crew/hinsen/MMTK/&lt;br /&gt;
&lt;br /&gt;
I'd be crazy to write another molecular graphics application (been&lt;br /&gt;
there - done that, actually :-).&lt;br /&gt;
&lt;br /&gt;
=== Input/output ===&lt;br /&gt;
&lt;br /&gt;
==== How do I create a structure object from a PDB file? ====&lt;br /&gt;
&lt;br /&gt;
First, create a &amp;lt;code&amp;gt;PDBParser&amp;lt;/code&amp;gt; object:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
parser = PDBParser()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then, create a structure object from a PDB file in the following way&lt;br /&gt;
(the PDB file in this case is called '1FAT.pdb', 'PHA-L' is a user&lt;br /&gt;
defined name for the structure):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
structure = parser.get_structure('PHA-L', '1FAT.pdb')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I create a structure object from an mmCIF file? ====&lt;br /&gt;
&lt;br /&gt;
Similarly to the case the case of PDB files, first create an &amp;lt;code&amp;gt;MMCIFParser&amp;lt;/code&amp;gt;&lt;br /&gt;
object:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
parser = MMCIFParser()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Then use this parser to create a structure object from the mmCIF file:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
structure = parser.get_structure('PHA-L', '1FAT.cif')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== ...and what about the new PDB XML format? ====&lt;br /&gt;
&lt;br /&gt;
That's not yet supported, but I'm definitely planning to support that&lt;br /&gt;
in the future (it's not a lot of work). Contact me if you need this,&lt;br /&gt;
it might encourage me :-).&lt;br /&gt;
&lt;br /&gt;
==== I'd like to have some more low level access to an mmCIF file... ====&lt;br /&gt;
&lt;br /&gt;
You got it. You can create a python dictionary that maps all mmCIF&lt;br /&gt;
tags in an mmCIF file to their values. If there are multiple values&lt;br /&gt;
(like in the case of tag &amp;lt;code&amp;gt;_atom_site.Cartn_y&amp;lt;/code&amp;gt;, which holds&lt;br /&gt;
the ''y'' coordinates of all atoms), the tag is mapped to a list of values.&lt;br /&gt;
The dictionary is created from the mmCIF file as follows:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
mmcif_dict = MMCIF2Dict('1FAT.cif')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example: get the solvent content from an mmCIF file:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
sc = mmcif_dict['_exptl_crystal.density_percent_sol']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example: get the list of the ''y'' coordinates of all atoms&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
y_list = mmcif_dict['_atom_site.Cartn_y']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Can I access the header information? ====&lt;br /&gt;
&lt;br /&gt;
Thanks to Christian Rother you can access some information from the&lt;br /&gt;
PDB header. Note however that many PDB files contain headers with&lt;br /&gt;
incomplete or erroneous information. Many of the errors have been&lt;br /&gt;
fixed in the equivalent mmCIF files.&lt;br /&gt;
'''Hence, if you are interested in the header information, it is a good idea to extract information from mmCIF files using the &amp;lt;code&amp;gt;MMCIF2Dict&amp;lt;/code&amp;gt; tool described above, instead of parsing the PDB header.''' &lt;br /&gt;
&lt;br /&gt;
Now that is clarified, let's return to parsing the PDB header. The&lt;br /&gt;
structure object has an attribute called &amp;lt;code&amp;gt;header&amp;lt;/code&amp;gt; which is&lt;br /&gt;
a python dictionary that maps header records to their values.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
resolution = structure.header['resolution']&lt;br /&gt;
keywords = structure.header['keywords']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The available keys are &amp;lt;code&amp;gt;name&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;head&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;deposition_date&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;release_date&amp;lt;/code&amp;gt;,&lt;br /&gt;
&amp;lt;code&amp;gt;structure_method&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;resolution&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;structure_reference&amp;lt;/code&amp;gt; (maps to&lt;br /&gt;
a list of references), &amp;lt;code&amp;gt;journal_reference&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;author&amp;lt;/code&amp;gt; and&lt;br /&gt;
&amp;lt;code&amp;gt;compound&amp;lt;/code&amp;gt; (maps to a dictionary with various information about&lt;br /&gt;
the crystallized compound).&lt;br /&gt;
&lt;br /&gt;
The dictionary can also be created without creating a &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt;&lt;br /&gt;
object, ie. directly from the PDB file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
handle = open(filename,'r')&lt;br /&gt;
header_dict = parse_pdb_header(handle)&lt;br /&gt;
handle.close()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Can I use Bio.PDB with NMR structures (ie. with more than one model)? ====&lt;br /&gt;
&lt;br /&gt;
Sure. Many PDB parsers assume that there is only one model, making&lt;br /&gt;
them all but useless for NMR structures. The design of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt;&lt;br /&gt;
object makes it easy to handle PDB files with more than one model&lt;br /&gt;
(see section [[#The Structure object]]). &lt;br /&gt;
&lt;br /&gt;
==== How do I download structures from the PDB? ====&lt;br /&gt;
&lt;br /&gt;
This can be done using the &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object, using the &amp;lt;code&amp;gt;retrieve_pdb_file&amp;lt;/code&amp;gt;&lt;br /&gt;
method. The argument for this method is the PDB identifier of the&lt;br /&gt;
structure.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
pdbl = PDBList()&lt;br /&gt;
pdbl.retrieve_pdb_file('1FAT')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; class can also be used as a command-line tool: &lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
python PDBList.py 1fat&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
The downloaded file will be called &amp;lt;code&amp;gt;pdb1fat.ent&amp;lt;/code&amp;gt; and stored&lt;br /&gt;
in the current working directory. Note that the &amp;lt;code&amp;gt;retrieve_pdb_file&amp;lt;/code&amp;gt;&lt;br /&gt;
method also has an optional argument &amp;lt;code&amp;gt;pdir&amp;lt;/code&amp;gt; that specifies&lt;br /&gt;
a specific directory in which to store the downloaded PDB files. &lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;retrieve_pdb_file&amp;lt;/code&amp;gt; method also has some options for specifying&lt;br /&gt;
the compression format used for the download, and the program used&lt;br /&gt;
for local decompression (default &amp;lt;code&amp;gt;.Z&amp;lt;/code&amp;gt; format and &amp;lt;code&amp;gt;gunzip&amp;lt;/code&amp;gt;).&lt;br /&gt;
In addition, the PDB ftp site can be specified upon creation of the&lt;br /&gt;
&amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object. By default, the server of the Worldwide Protein Data Bank (ftp://ftp.wwpdb.org/pub/pdb/data/structures/divided/pdb/)&lt;br /&gt;
is used. See the API documentation for more details. Thanks again&lt;br /&gt;
to Kristian Rother for donating this module.&lt;br /&gt;
&lt;br /&gt;
==== How do I download the entire PDB? ====&lt;br /&gt;
&lt;br /&gt;
The following commands will store all PDB files in the &amp;lt;code&amp;gt;/data/pdb&amp;lt;/code&amp;gt;&lt;br /&gt;
directory:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
python PDBList.py all /data/pdb&lt;br /&gt;
python PDBList.py all /data/pdb -d&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The API method for this is called &amp;lt;code&amp;gt;download_entire_pdb&amp;lt;/code&amp;gt;.&lt;br /&gt;
Adding the &amp;lt;code&amp;gt;-d&amp;lt;/code&amp;gt; option will store all files in the same directory.&lt;br /&gt;
Otherwise, they are sorted into PDB-style subdirectories according&lt;br /&gt;
to their PDB ID's. Depending on the traffic, a complete download will&lt;br /&gt;
take 2-4 days.&lt;br /&gt;
&lt;br /&gt;
==== How do I keep a local copy of the PDB up-to-date? ====&lt;br /&gt;
&lt;br /&gt;
This can also be done using the &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object. One simply&lt;br /&gt;
creates a &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object (specifying the directory where&lt;br /&gt;
the local copy of the PDB is present) and calls the &amp;lt;code&amp;gt;update_pdb&amp;lt;/code&amp;gt;&lt;br /&gt;
method:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
pl = PDBList(pdb='/data/pdb')&lt;br /&gt;
pl.update_pdb()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
One can of course make a weekly cronjob out of this to keep&lt;br /&gt;
the local copy automatically up-to-date. The PDB ftp site can also&lt;br /&gt;
be specified (see API documentation).&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; has some additional methods that can be of use. The&lt;br /&gt;
&amp;lt;code&amp;gt;get_all_obsolete&amp;lt;/code&amp;gt; method can be used to get a list of all&lt;br /&gt;
obsolete PDB entries. The &amp;lt;code&amp;gt;changed_this_week&amp;lt;/code&amp;gt; method can&lt;br /&gt;
be used to obtain the entries that were added, modified or obsoleted&lt;br /&gt;
during the current week. For more info on the possibilities of &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt;,&lt;br /&gt;
see the API documentation.&lt;br /&gt;
&lt;br /&gt;
==== What about all those buggy PDB files? ====&lt;br /&gt;
&lt;br /&gt;
It is well known that many PDB files contain semantic errors (I'm&lt;br /&gt;
not talking about the structures themselves know, but their representation&lt;br /&gt;
in PDB files). Bio.PDB tries to handle this in two ways. The PDBParser&lt;br /&gt;
object can behave in two ways: a restrictive way and a permissive&lt;br /&gt;
way (THIS IS NOW THE DEFAULT). The restrictive way used to be the&lt;br /&gt;
default, but people seemed to think that Bio.PDB 'crashed' due to&lt;br /&gt;
a bug (hah!), so I changed it. If you ever encounter a real bug, please&lt;br /&gt;
tell me immediately!&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Permissive parser&lt;br /&gt;
parser = PDBParser(PERMISSIVE=1)&lt;br /&gt;
parser = PDBParser()  # The same (default)&lt;br /&gt;
# Strict parser&lt;br /&gt;
strict_parser = PDBParser(PERMISSIVE=0)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
In the permissive state (DEFAULT), PDB files that obviously contain&lt;br /&gt;
errors are 'corrected' (i.e. some residues or atoms are left out).&lt;br /&gt;
These errors include:&lt;br /&gt;
* Multiple residues with the same identifier&lt;br /&gt;
* Multiple atoms with the same identifier (taking into account the altloc identifier)&lt;br /&gt;
These errors indicate real problems in the PDB file (for details see&lt;br /&gt;
the Bioinformatics article). In the restrictive state, PDB files with&lt;br /&gt;
errors cause an exception to occur. This is useful to find errors&lt;br /&gt;
in PDB files.&lt;br /&gt;
&lt;br /&gt;
Some errors however are automatically corrected. Normally each disordered&lt;br /&gt;
atom should have a non-blank altloc identifier. However, there are&lt;br /&gt;
many structures that do not follow this convention, and have a blank&lt;br /&gt;
and a non-blank identifier for two disordered positions of the same&lt;br /&gt;
atom. This is automatically interpreted in the right way.&lt;br /&gt;
&lt;br /&gt;
Sometimes a structure contains a list of residues belonging to chain&lt;br /&gt;
A, followed by residues belonging to chain B, and again followed by&lt;br /&gt;
residues belonging to chain A, i.e. the chains are 'broken'. This&lt;br /&gt;
is also correctly interpreted.&lt;br /&gt;
&lt;br /&gt;
==== Can I write PDB files? ====&lt;br /&gt;
&lt;br /&gt;
Use the PDBIO class for this. It's easy to write out specific parts&lt;br /&gt;
of a structure too, of course.&lt;br /&gt;
&lt;br /&gt;
Example: saving a structure&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
io = PDBIO()&lt;br /&gt;
io.set_structure(s)&lt;br /&gt;
io.save('out.pdb')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
If you want to write out a part of the structure, make use of the&lt;br /&gt;
&amp;lt;code&amp;gt;Select&amp;lt;/code&amp;gt; class (also in &amp;lt;code&amp;gt;PDBIO&amp;lt;/code&amp;gt;). Select has four methods:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
accept_model(model)&lt;br /&gt;
accept_chain(chain)&lt;br /&gt;
accept_residue(residue)&lt;br /&gt;
accept_atom(atom)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
By default, every method returns 1 (which means the model/chain/residue/atom&lt;br /&gt;
is included in the output). By subclassing &amp;lt;code&amp;gt;Select&amp;lt;/code&amp;gt; and returning&lt;br /&gt;
0 when appropriate you can exclude models, chains, etc. from the output.&lt;br /&gt;
Cumbersome maybe, but very powerful. The following code only writes&lt;br /&gt;
out glycine residues:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
class GlySelect(Select):&lt;br /&gt;
    def accept_residue(self, residue):&lt;br /&gt;
        if residue.get_name()=='GLY':&lt;br /&gt;
            return 1&lt;br /&gt;
        else:&lt;br /&gt;
            return 0&lt;br /&gt;
&lt;br /&gt;
io = PDBIO()&lt;br /&gt;
io.set_structure(s)&lt;br /&gt;
io.save('gly_only.pdb', GlySelect())&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
If this is all too complicated for you, the &amp;lt;code&amp;gt;Dice&amp;lt;/code&amp;gt; module contains&lt;br /&gt;
a handy &amp;lt;code&amp;gt;extract&amp;lt;/code&amp;gt; function that writes out all residues in&lt;br /&gt;
a chain between a start and end residue.&lt;br /&gt;
&lt;br /&gt;
==== Can I write mmCIF files? ====&lt;br /&gt;
&lt;br /&gt;
No, and I also don't have plans to add that functionality soon (or&lt;br /&gt;
ever - I don't need it at all, and it's a lot of work, plus no-one&lt;br /&gt;
has ever asked for it). People who want to add this can contact me.&lt;br /&gt;
&lt;br /&gt;
=== The Structure object ===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==== What's the overall layout of a Structure object? ====&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object follows the so-called '''SMCRA'''&lt;br /&gt;
(Structure/Model/Chain/Residue/Atom) architecture : &lt;br /&gt;
* A structure consists of models&lt;br /&gt;
* A model consists of chains&lt;br /&gt;
* A chain consists of residues&lt;br /&gt;
* A residue consists of atoms&lt;br /&gt;
This is the way many structural biologists/bioinformaticians think&lt;br /&gt;
about structure, and provides a simple but efficient way to deal with&lt;br /&gt;
structure. Additional stuff is essentially added when needed. A UML&lt;br /&gt;
diagram of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object (forgetting about the &amp;lt;code&amp;gt;Disordered&amp;lt;/code&amp;gt;&lt;br /&gt;
classes for now) is shown in the figure below.&lt;br /&gt;
&lt;br /&gt;
[[Image:Smcra.png|600px|left|frame|Diagram of SMCRA architecture of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object. Full lines with diamonds denote aggregation, full lines with arrows denote referencing, full lines with triangles denote inheritance and dashed lines with triangles denote interface realization.]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br clear=all&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I navigate through a Structure object? ====&lt;br /&gt;
&lt;br /&gt;
The following code iterates through all atoms of a structure:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
p = PDBParser()&lt;br /&gt;
structure = p.get_structure('X', 'pdb1fat.ent')&lt;br /&gt;
for model in structure:&lt;br /&gt;
    for chain in model:&lt;br /&gt;
        for residue in chain:&lt;br /&gt;
            for atom in residue:&lt;br /&gt;
                print atom&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
There are also some shortcuts:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Iterate over all atoms in a structure&lt;br /&gt;
for atom in structure.get_atoms():&lt;br /&gt;
    print atom&lt;br /&gt;
&lt;br /&gt;
# Iterate over all residues in a model&lt;br /&gt;
for residue in model.get_residues():&lt;br /&gt;
    print residue&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Structures, models, chains, residues and atoms are called '''Entities'''&lt;br /&gt;
in Biopython. You can always get a parent &amp;lt;code&amp;gt;Entity&amp;lt;/code&amp;gt; from a child&lt;br /&gt;
&amp;lt;code&amp;gt;Entity&amp;lt;/code&amp;gt;, e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
residue = atom.get_parent()&lt;br /&gt;
chain = residue.get_parent()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
You can also test whether an &amp;lt;code&amp;gt;Entity&amp;lt;/code&amp;gt; has a certain child using&lt;br /&gt;
the &amp;lt;code&amp;gt;has_id&amp;lt;/code&amp;gt; method.&lt;br /&gt;
&lt;br /&gt;
==== Can I do that a bit more conveniently? ====&lt;br /&gt;
&lt;br /&gt;
You can do things like:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
atoms = structure.get_atoms()&lt;br /&gt;
residues = structure.get_residues()&lt;br /&gt;
atoms = chain.get_atoms()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
You can also use the &amp;lt;code&amp;gt;Selection.unfold_entities&amp;lt;/code&amp;gt; function:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Get all residues from a structure&lt;br /&gt;
res_list = Selection.unfold_entities(structure, 'R')&lt;br /&gt;
# Get all atoms from a chain&lt;br /&gt;
atom_list = Selection.unfold_entities(chain, 'A')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Obviously, &amp;lt;code&amp;gt;A&amp;lt;/code&amp;gt;=atom, &amp;lt;code&amp;gt;R&amp;lt;/code&amp;gt;=residue, &amp;lt;code&amp;gt;C&amp;lt;/code&amp;gt;=chain, &amp;lt;code&amp;gt;M&amp;lt;/code&amp;gt;=model, &amp;lt;code&amp;gt;S&amp;lt;/code&amp;gt;=structure.&lt;br /&gt;
You can use this to go up in the hierarchy, e.g. to get a list of (unique) &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;Chain&amp;lt;/code&amp;gt; parents from a list of&lt;br /&gt;
&amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;s:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
residue_list = Selection.unfold_entities(atom_list, 'R')&lt;br /&gt;
chain_list = Selection.unfold_entities(atom_list, 'C')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
For more info, see the API documentation.&lt;br /&gt;
&lt;br /&gt;
==== How do I extract a specific &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;Chain&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;Model&amp;lt;/code&amp;gt; from a &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt;? ====&lt;br /&gt;
&lt;br /&gt;
Easy. Here are some examples:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
model = structure[0]&lt;br /&gt;
chain = model['A']&lt;br /&gt;
residue = chain[100]&lt;br /&gt;
atom = residue['CA']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Note that you can use a shortcut:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
atom = structure[0]['A'][100]['CA']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== What is a model id? ====&lt;br /&gt;
&lt;br /&gt;
The model id is an integer which denotes the rank of the model in&lt;br /&gt;
the PDB/mmCIF file. The model is starts at 0. Crystal structures generally&lt;br /&gt;
have only one model (with id 0), while NMR files usually have several&lt;br /&gt;
models.&lt;br /&gt;
&lt;br /&gt;
==== What is a chain id? ====&lt;br /&gt;
&lt;br /&gt;
The chain id is specified in the PDB/mmCIF file, and is a single character&lt;br /&gt;
(typically a letter). &lt;br /&gt;
&lt;br /&gt;
==== What is a residue id? ====&lt;br /&gt;
&lt;br /&gt;
This is a bit more complicated, due to the clumsy PDB format. A residue&lt;br /&gt;
id is a tuple with three elements:&lt;br /&gt;
* The '''hetero-flag''': this is &amp;lt;code&amp;gt;'H_'&amp;lt;/code&amp;gt; plus the name of the hetero-residue (e.g. &amp;lt;code&amp;gt;'H_GLC'&amp;lt;/code&amp;gt; in the case of a glucose molecule), or &amp;lt;code&amp;gt;'W'&amp;lt;/code&amp;gt; in the case of a water molecule.&lt;br /&gt;
* The '''sequence identifier''' in the chain, e.g. 100&lt;br /&gt;
* The '''insertion code''', e.g. &amp;lt;code&amp;gt;'A'&amp;lt;/code&amp;gt;. The insertion code is sometimes used to preserve a certain desirable residue numbering scheme. A Ser 80 insertion mutant (inserted e.g. between a Thr 80 and an Asn 81 residue) could e.g. have sequence identifiers and insertion codes as follows: Thr 80 A, Ser 80 B, Asn 81. In this way the residue numbering scheme stays in tune with that of the wild type structure.&lt;br /&gt;
&lt;br /&gt;
The id of the above glucose residue would thus be &amp;lt;code&amp;gt;('H_GLC', 100, 'A')&amp;lt;/code&amp;gt;. If the hetero-flag and insertion code are blank, the sequence&lt;br /&gt;
identifier alone can be used:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Full id&lt;br /&gt;
residue = chain[(' ', 100, ' ')]&lt;br /&gt;
# Shortcut id&lt;br /&gt;
residue = chain[100]&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
The reason for the hetero-flag is that many, many PDB files use the same sequence identifier for an amino acid and a hetero-residue or a water, which would create obvious problems if the hetero-flag was not used.&lt;br /&gt;
&lt;br /&gt;
==== What is an atom id? ====&lt;br /&gt;
&lt;br /&gt;
The atom id is simply the atom name (eg. &amp;lt;code&amp;gt;'CA'&amp;lt;/code&amp;gt;). In practice,&lt;br /&gt;
the atom name is created by stripping all spaces from the atom name&lt;br /&gt;
in the PDB file. &lt;br /&gt;
&lt;br /&gt;
However, in PDB files, a space can be part of an atom name. Often,&lt;br /&gt;
calcium atoms are called &amp;lt;code&amp;gt;'CA..'&amp;lt;/code&amp;gt; in order to distinguish them&lt;br /&gt;
from C&amp;amp;alpha; atoms (which are called &amp;lt;code&amp;gt;'.CA.'&amp;lt;/code&amp;gt;). In cases&lt;br /&gt;
were stripping the spaces would create problems (ie. two atoms called&lt;br /&gt;
&amp;lt;code&amp;gt;'CA'&amp;lt;/code&amp;gt; in the same residue) the spaces are kept.&lt;br /&gt;
&lt;br /&gt;
==== How is disorder handled? ====&lt;br /&gt;
&lt;br /&gt;
This is one of the strong points of Bio.PDB. It can handle both disordered&lt;br /&gt;
atoms and point mutations (ie. a Gly and an Ala residue in the same&lt;br /&gt;
position). &lt;br /&gt;
&lt;br /&gt;
Disorder should be dealt with from two points of view: the atom and&lt;br /&gt;
the residue points of view. In general, I have tried to encapsulate&lt;br /&gt;
all the complexity that arises from disorder. If you just want to&lt;br /&gt;
loop over all C&amp;amp;alpha; atoms, you do not care that some residues&lt;br /&gt;
have a disordered side chain. On the other hand it should also be&lt;br /&gt;
possible to represent disorder completely in the data structure. Therefore,&lt;br /&gt;
disordered atoms or residues are stored in special objects that behave&lt;br /&gt;
as if there is no disorder. This is done by only representing a subset&lt;br /&gt;
of the disordered atoms or residues. Which subset is picked (e.g.&lt;br /&gt;
which of the two disordered OG side chain atom positions of a Ser&lt;br /&gt;
residue is used) can be specified by the user.&lt;br /&gt;
&lt;br /&gt;
'''Disordered atom positions''' are represented by ordinary &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;&lt;br /&gt;
objects, but all &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects that represent the same physical&lt;br /&gt;
atom are stored in a &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; object (see section [[#The Structure object]]).&lt;br /&gt;
Each &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object in a &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; object can&lt;br /&gt;
be uniquely indexed using its altloc specifier. The &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt;&lt;br /&gt;
object forwards all uncaught method calls to the selected Atom object,&lt;br /&gt;
by default the one that represents the atom with the highest&lt;br /&gt;
occupancy. The user can of course change the selected &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;&lt;br /&gt;
object, making use of its altloc specifier. In this way atom disorder&lt;br /&gt;
is represented correctly without much additional complexity. In other&lt;br /&gt;
words, if you are not interested in atom disorder, you will not be&lt;br /&gt;
bothered by it.&lt;br /&gt;
&lt;br /&gt;
Each disordered atom has a characteristic altloc identifier. You can&lt;br /&gt;
specify that a &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; object should behave like&lt;br /&gt;
the &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object associated with a specific altloc identifier:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
atom.disordered_select('A')  # select altloc A atom&lt;br /&gt;
atom.disordered_select('B')  # select altloc B atom &lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
A special case arises when disorder is due to '''point mutations''',&lt;br /&gt;
i.e. when two or more point mutants of a polypeptide are present in&lt;br /&gt;
the crystal. An example of this can be found in PDB structure 1EN2.&lt;br /&gt;
&lt;br /&gt;
Since these residues belong to a different residue type (e.g. let's&lt;br /&gt;
say Ser 60 and Cys 60) they should not be stored in a single &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt;&lt;br /&gt;
object as in the common case. In this case, each residue is represented&lt;br /&gt;
by one &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object, and both &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; objects&lt;br /&gt;
are stored in a single &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt; object (see section [[#The Structure object]]).&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt; object forwards all uncaught&lt;br /&gt;
methods to the selected &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object (by default the last&lt;br /&gt;
&amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object added), and thus behaves like an ordinary&lt;br /&gt;
residue. Each &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object in a &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt;&lt;br /&gt;
object can be uniquely identified by its residue name. In the above&lt;br /&gt;
example, residue Ser 60 would have id 'SER' in the &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt;&lt;br /&gt;
object, while residue Cys 60 would have id 'CYS'. The user can select&lt;br /&gt;
the active &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object in a &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt;&lt;br /&gt;
object via this id.&lt;br /&gt;
&lt;br /&gt;
Example: suppose that a chain has a point mutation at position 10,&lt;br /&gt;
consisting of a Ser and a Cys residue. Make sure that residue 10 of&lt;br /&gt;
this chain behaves as the Cys residue.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
residue = chain[10]&lt;br /&gt;
residue.disordered_select('CYS')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
In addition, you can get a list of all &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects (ie.&lt;br /&gt;
all &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; objects are 'unpacked' to their individual&lt;br /&gt;
&amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects) using the &amp;lt;code&amp;gt;get_unpacked_list&amp;lt;/code&amp;gt; method&lt;br /&gt;
of a (&amp;lt;code&amp;gt;Disordered&amp;lt;/code&amp;gt;)&amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object.&lt;br /&gt;
&lt;br /&gt;
==== Can I sort residues in a chain somehow? ====&lt;br /&gt;
&lt;br /&gt;
Yes, kinda, but I'm waiting for a request for this feature to finish&lt;br /&gt;
it :-).&lt;br /&gt;
&lt;br /&gt;
==== How are ligands and solvent handled? ====&lt;br /&gt;
&lt;br /&gt;
See 'What is a residue id?'.&lt;br /&gt;
&lt;br /&gt;
==== What about B factors? ====&lt;br /&gt;
&lt;br /&gt;
Well, yes! Bio.PDB supports isotropic and anisotropic B factors, and&lt;br /&gt;
also deals with standard deviations of anisotropic B factor if present&lt;br /&gt;
(see the section [[#Analysis]]).&lt;br /&gt;
&lt;br /&gt;
==== What about standard deviation of atomic positions? ====&lt;br /&gt;
&lt;br /&gt;
Yup, supported. See the section [[#Analysis]].&lt;br /&gt;
&lt;br /&gt;
==== I think the SMCRA data structure is not flexible/sexy/whatever enough... ====&lt;br /&gt;
&lt;br /&gt;
Sure, sure. Everybody is always coming up with (mostly vaporware or&lt;br /&gt;
partly implemented) data structures that handle all possible situations&lt;br /&gt;
and are extensible in all thinkable (and unthinkable) ways. The prosaic&lt;br /&gt;
truth however is that 99.9\% of people using (and I mean really using!)&lt;br /&gt;
crystal structures think in terms of models, chains, residues and&lt;br /&gt;
atoms. The philosophy of Bio.PDB is to provide a reasonably fast,&lt;br /&gt;
clean, simple, but complete data structure to access structure data.&lt;br /&gt;
The proof of the pudding is in the eating.&lt;br /&gt;
&lt;br /&gt;
Moreover, it is quite easy to build more specialised data structures&lt;br /&gt;
on top of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; class (eg. there's a &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt;&lt;br /&gt;
class). On the other hand, the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object is built&lt;br /&gt;
using a Parser/Consumer approach (called &amp;lt;code&amp;gt;PDBParser&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;MMCIFParser&amp;lt;/code&amp;gt;&lt;br /&gt;
and &amp;lt;code&amp;gt;StructureBuilder&amp;lt;/code&amp;gt;, respectively). One can easily reuse&lt;br /&gt;
the PDB/mmCIF parsers by implementing a specialised &amp;lt;code&amp;gt;StructureBuilder&amp;lt;/code&amp;gt;&lt;br /&gt;
class. It is of course also trivial to add support for new file formats&lt;br /&gt;
by writing new parsers.&lt;br /&gt;
&lt;br /&gt;
=== Analysis ===&lt;br /&gt;
&lt;br /&gt;
==== How do I extract information from an &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object? ====&lt;br /&gt;
&lt;br /&gt;
Using the following methods:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
a.get_name()           # atom name (spaces stripped, e.g. 'CA')&lt;br /&gt;
a.get_id()             # id (equals atom name)&lt;br /&gt;
a.get_coord()          # atomic coordinates&lt;br /&gt;
a.get_vector()         # atomic coordinates as Vector object&lt;br /&gt;
a.get_bfactor()        # isotropic B factor&lt;br /&gt;
a.get_occupancy()      # occupancy&lt;br /&gt;
a.get_altloc()         # alternative location specifier&lt;br /&gt;
a.get_sigatm()         # std. dev. of atomic parameters&lt;br /&gt;
a.get_siguij()         # std. dev. of anisotropic B factor&lt;br /&gt;
a.get_anisou()         # anisotropic B factor&lt;br /&gt;
a.get_fullname()       # atom name (with spaces, e.g. '.CA.')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I extract information from a &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object? ====&lt;br /&gt;
&lt;br /&gt;
Using the following methods:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
r.get_resname()         # return the residue name (eg. 'GLY')&lt;br /&gt;
r.is_disordered()       # 1 if the residue has disordered atoms&lt;br /&gt;
r.get_segid()           # return the SEGID&lt;br /&gt;
r.has_id(name)          # test if a residue has a certain atom&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I measure distances? ====&lt;br /&gt;
&lt;br /&gt;
That's simple: the minus operator for atoms has been overloaded to&lt;br /&gt;
return the distance between two atoms. &lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Get some atoms&lt;br /&gt;
ca1 = residue1['CA']&lt;br /&gt;
ca2 = residue2['CA']&lt;br /&gt;
# Simply subtract the atoms to get their distance&lt;br /&gt;
distance = ca1-ca2&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I measure angles? ====&lt;br /&gt;
&lt;br /&gt;
This can easily be done via the vector representation of the atomic coordinates, and the &amp;lt;code&amp;gt;calc_angle&amp;lt;/code&amp;gt; function from the &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; module:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
vector1 = atom1.get_vector()&lt;br /&gt;
vector2 = atom2.get_vector()&lt;br /&gt;
vector3 = atom3.get_vector()&lt;br /&gt;
angle = calc_angle(vector1, vector2, vector3)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I measure torsion angles? ====&lt;br /&gt;
&lt;br /&gt;
Again, this can easily be done via the vector representation of the&lt;br /&gt;
atomic coordinates, this time using the &amp;lt;code&amp;gt;calc_dihedral&amp;lt;/code&amp;gt; function&lt;br /&gt;
from the &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; module:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
vector1 = atom1.get_vector()&lt;br /&gt;
vector2 = atom2.get_vector()&lt;br /&gt;
vector3 = atom3.get_vector()&lt;br /&gt;
vector4 = atom4.get_vector()&lt;br /&gt;
angle = calc_dihedral(vector1, vector2, vector3, vector4)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I determine atom-atom contacts? ====&lt;br /&gt;
&lt;br /&gt;
Use &amp;lt;code&amp;gt;NeighborSearch&amp;lt;/code&amp;gt;. This uses a KD tree data structure coded&lt;br /&gt;
in C behind the screens, so it's pretty darn fast (see &amp;lt;code&amp;gt;Bio.KDTree&amp;lt;/code&amp;gt;).&lt;br /&gt;
&lt;br /&gt;
==== How do I extract polypeptides from a &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object? ====&lt;br /&gt;
&lt;br /&gt;
Use &amp;lt;code&amp;gt;PolypeptideBuilder&amp;lt;/code&amp;gt;. You can use the resulting &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt;&lt;br /&gt;
object to get the sequence as a &amp;lt;code&amp;gt;Seq&amp;lt;/code&amp;gt; object or to get a list&lt;br /&gt;
of C&amp;amp;alpha; atoms as well. Polypeptides can be built using a C-N&lt;br /&gt;
or a C&amp;amp;alpha;-C&amp;amp;alpha; distance criterion.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Using C-N &lt;br /&gt;
ppb=PPBuilder()&lt;br /&gt;
for pp in ppb.build_peptides(structure): &lt;br /&gt;
    print pp.get_sequence()&lt;br /&gt;
&lt;br /&gt;
# Using CA-CA&lt;br /&gt;
ppb=CaPPBuilder()&lt;br /&gt;
for pp in ppb.build_peptides(structure): &lt;br /&gt;
    print pp.get_sequence()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that in the above case only model 0 of the structure is considered&lt;br /&gt;
by &amp;lt;code&amp;gt;PolypeptideBuilder&amp;lt;/code&amp;gt;. However, it is possible to use &amp;lt;code&amp;gt;PolypeptideBuilder&amp;lt;/code&amp;gt;&lt;br /&gt;
to build &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt; objects from &amp;lt;code&amp;gt;Model&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;Chain&amp;lt;/code&amp;gt;&lt;br /&gt;
objects as well.&lt;br /&gt;
&lt;br /&gt;
==== How do I get the sequence of a structure? ====&lt;br /&gt;
&lt;br /&gt;
The first thing to do is to extract all polypeptides from the structure&lt;br /&gt;
(see previous entry). The sequence of each polypeptide can then easily&lt;br /&gt;
be obtained from the &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt; objects. The sequence is&lt;br /&gt;
represented as a Biopython &amp;lt;code&amp;gt;Seq&amp;lt;/code&amp;gt; object, and its alphabet is&lt;br /&gt;
defined by a &amp;lt;code&amp;gt;ProteinAlphabet&amp;lt;/code&amp;gt; object.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; seq = polypeptide.get_sequence()&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print seq&lt;br /&gt;
Seq('SNVVE...', &amp;lt;class Bio.Alphabet.ProteinAlphabet&amp;gt;)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I determine secondary structure? ====&lt;br /&gt;
&lt;br /&gt;
For this functionality, you need to install DSSP (and obtain a license&lt;br /&gt;
for it - free for academic use, see http://www.cmbi.kun.nl/gv/dssp/).&lt;br /&gt;
Then use the &amp;lt;code&amp;gt;DSSP&amp;lt;/code&amp;gt; class, which maps &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; objects&lt;br /&gt;
to their secondary structure (and accessible surface area). The DSSP&lt;br /&gt;
codes are listed in the Table below. Note that DSSP (the&lt;br /&gt;
program, and thus by consequence the class) cannot handle multiple&lt;br /&gt;
models!&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot; style=&amp;quot;text-align: center;&amp;quot;&lt;br /&gt;
|+ DSSP codes in Bio.PDB&lt;br /&gt;
! DSSP Code&lt;br /&gt;
! Secondary structure&lt;br /&gt;
|-&lt;br /&gt;
|H&lt;br /&gt;
| align=&amp;quot;left&amp;quot; | &amp;amp;alpha;-helix&lt;br /&gt;
|-&lt;br /&gt;
|B&lt;br /&gt;
| align=&amp;quot;left&amp;quot; | Isolated &amp;amp;beta;-bridge residue&lt;br /&gt;
|-&lt;br /&gt;
|E&lt;br /&gt;
| align=&amp;quot;left&amp;quot; | Strand&lt;br /&gt;
|-&lt;br /&gt;
|G&lt;br /&gt;
| align=&amp;quot;left&amp;quot; | 3-10 helix&lt;br /&gt;
|-&lt;br /&gt;
|I&lt;br /&gt;
| align=&amp;quot;left&amp;quot; | &amp;amp;Pi;-helix&lt;br /&gt;
|-&lt;br /&gt;
|T&lt;br /&gt;
| align=&amp;quot;left&amp;quot; | Turn&lt;br /&gt;
|-&lt;br /&gt;
|S&lt;br /&gt;
| align=&amp;quot;left&amp;quot; | Bend&lt;br /&gt;
|-&lt;br /&gt;
| -&lt;br /&gt;
| align=&amp;quot;left&amp;quot; | Other&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
==== How do I calculate the accessible surface area of a residue? ====&lt;br /&gt;
&lt;br /&gt;
Use the &amp;lt;code&amp;gt;DSSP&amp;lt;/code&amp;gt; class (see also previous entry). But see also&lt;br /&gt;
next entry.&lt;br /&gt;
&lt;br /&gt;
==== How do I calculate residue depth? ====&lt;br /&gt;
&lt;br /&gt;
Residue depth is the average distance of a residue's atoms from the&lt;br /&gt;
solvent accessible surface. It's a fairly new and very powerful parameterization&lt;br /&gt;
of solvent accessibility. For this functionality, you need to install&lt;br /&gt;
Michel Sanner's MSMS program (http://www.scripps.edu/pub/olson-web/people/sanner/html/msms_home.html).&lt;br /&gt;
Then use the &amp;lt;code&amp;gt;ResidueDepth&amp;lt;/code&amp;gt; class. This class behaves as a&lt;br /&gt;
dictionary which maps &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; objects to corresponding (residue&lt;br /&gt;
depth, C&amp;amp;alpha; depth) tuples. The C&amp;amp;alpha; depth is the distance&lt;br /&gt;
of a residue's C&amp;amp;alpha; atom to the solvent accessible surface. &lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
model = structure[0]&lt;br /&gt;
rd = ResidueDepth(model, pdb_file)&lt;br /&gt;
residue_depth, ca_depth = rd[some_residue]&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
You can also get access to the molecular surface itself (via the &amp;lt;code&amp;gt;get_surface&amp;lt;/code&amp;gt;&lt;br /&gt;
function), in the form of a Numeric python array with the surface points.&lt;br /&gt;
&lt;br /&gt;
==== How do I calculate Half Sphere Exposure? ====&lt;br /&gt;
&lt;br /&gt;
Half Sphere Exposure (HSE) is a new, 2D measure of solvent exposure.&lt;br /&gt;
Basically, it counts the number of C&amp;amp;alpha; atoms around a residue&lt;br /&gt;
in the direction of its side chain, and in the opposite direction&lt;br /&gt;
(within a radius of 13 Å). Despite its simplicity, it outperforms&lt;br /&gt;
many other measures of solvent exposure. An article describing this&lt;br /&gt;
novel 2D measure has been submitted.&lt;br /&gt;
&lt;br /&gt;
HSE comes in two flavors: HSE&amp;amp;alpha; and HSE&amp;amp;beta;. The former&lt;br /&gt;
only uses the C&amp;amp;alpha; atom positions, while the latter uses the&lt;br /&gt;
C&amp;amp;alpha; and C&amp;amp;beta; atom positions. The HSE measure is calculated&lt;br /&gt;
by the &amp;lt;code&amp;gt;HSExposure&amp;lt;/code&amp;gt; class, which can also calculate the contact&lt;br /&gt;
number. The latter class has methods which return dictionaries that&lt;br /&gt;
map a &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object to its corresponding HSE&amp;amp;alpha;, HSE&amp;amp;beta;&lt;br /&gt;
and contact number values.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
model = structure[0]&lt;br /&gt;
hse = HSExposure()&lt;br /&gt;
# Calculate HSEalpha&lt;br /&gt;
exp_ca = hse.calc_hs_exposure(model, option='CA3')&lt;br /&gt;
# Calculate HSEbeta&lt;br /&gt;
exp_cb = hse.calc_hs_exposure(model, option='CB')&lt;br /&gt;
# Calculate classical coordination number&lt;br /&gt;
exp_fs = hse.calc_fs_exposure(model)&lt;br /&gt;
# Print HSEalpha for a residue&lt;br /&gt;
print exp_ca[some_residue]&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I map the residues of two related structures onto each other? ====&lt;br /&gt;
&lt;br /&gt;
First, create an alignment file in FASTA format, then use the &amp;lt;code&amp;gt;StructureAlignment&amp;lt;/code&amp;gt;&lt;br /&gt;
class. This class can also be used for alignments with more than two&lt;br /&gt;
structures.&lt;br /&gt;
&lt;br /&gt;
==== How do I test if a Residue object is an amino acid? ====&lt;br /&gt;
&lt;br /&gt;
Use &amp;lt;code&amp;gt;is_aa(residue)&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
==== Can I do vector operations on atomic coordinates? ====&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects return a &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; object representation&lt;br /&gt;
of the coordinates with the &amp;lt;code&amp;gt;get_vector&amp;lt;/code&amp;gt; method. &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt;&lt;br /&gt;
implements the full set of 3D vector operations, matrix multiplication&lt;br /&gt;
(left and right) and some advanced rotation-related operations as&lt;br /&gt;
well. See also next question.&lt;br /&gt;
&lt;br /&gt;
==== How do I put a virtual C&amp;amp;beta; on a Gly residue? ====&lt;br /&gt;
&lt;br /&gt;
OK, I admit, this example is only present to show off the possibilities&lt;br /&gt;
of Bio.PDB's &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; module (though this code is actually&lt;br /&gt;
used in the &amp;lt;code&amp;gt;HSExposure&amp;lt;/code&amp;gt; module, which contains a novel way&lt;br /&gt;
to parametrize residue exposure - publication underway). Suppose that&lt;br /&gt;
you would like to find the position of a Gly residue's C&amp;amp;beta; atom,&lt;br /&gt;
if it had one. How would you do that? Well, rotating the N atom of&lt;br /&gt;
the Gly residue along the C&amp;amp;alpha;-C bond over -120 degrees roughly&lt;br /&gt;
puts it in the position of a virtual C&amp;amp;beta; atom. Here's how to&lt;br /&gt;
do it, making use of the &amp;lt;code&amp;gt;rotaxis&amp;lt;/code&amp;gt; method (which can be used&lt;br /&gt;
to construct a rotation around a certain axis) of the &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt;&lt;br /&gt;
module:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# get atom coordinates as vectors&lt;br /&gt;
n = residue['N'].get_vector() &lt;br /&gt;
c = residue['C'].get_vector() &lt;br /&gt;
ca = residue['CA'].get_vector()&lt;br /&gt;
# center at origin&lt;br /&gt;
n = n - ca&lt;br /&gt;
c = c - ca&lt;br /&gt;
# find rotation matrix that rotates n -120 degrees along the ca-c vector&lt;br /&gt;
rot = rotaxis(-pi{*120.0/180.0, c)&lt;br /&gt;
# apply rotation to ca-n vector&lt;br /&gt;
cb_at_origin = n.left_multiply(rot)&lt;br /&gt;
# put on top of ca atom&lt;br /&gt;
cb = cb_at_origin + ca&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
This example shows that it's possible to do some quite nontrivial&lt;br /&gt;
vector operations on atomic data, which can be quite useful. In addition&lt;br /&gt;
to all the usual vector operations (cross (use &amp;lt;code&amp;gt;**&amp;lt;/code&amp;gt;), and&lt;br /&gt;
dot (use &amp;lt;code&amp;gt;*&amp;lt;/code&amp;gt;) product, angle, norm, etc.) and the above mentioned&lt;br /&gt;
&amp;lt;code&amp;gt;rotaxis function, the &amp;lt;code&amp;gt;Vector module also has methods&lt;br /&gt;
to rotate (&amp;lt;code&amp;gt;rotmat&amp;lt;/code&amp;gt;) or reflect (&amp;lt;code&amp;gt;refmat&amp;lt;/code&amp;gt;) one vector&lt;br /&gt;
on top of another.&lt;br /&gt;
&lt;br /&gt;
=== Manipulating the structure ===&lt;br /&gt;
&lt;br /&gt;
==== How do I superimpose two structures? ====&lt;br /&gt;
&lt;br /&gt;
Surprisingly, this is done using the &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; object.&lt;br /&gt;
This object calculates the rotation and translation matrix that rotates&lt;br /&gt;
two lists of atoms on top of each other in such a way that their RMSD&lt;br /&gt;
is minimized. Of course, the two lists need to contain the same amount&lt;br /&gt;
of atoms. The &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; object can also apply the rotation/translation&lt;br /&gt;
to a list of atoms. The rotation and translation are stored as a tuple&lt;br /&gt;
in the &amp;lt;code&amp;gt;rotran&amp;lt;/code&amp;gt; attribute of the &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; object&lt;br /&gt;
(note that the rotation is right multiplying!). The RMSD is stored&lt;br /&gt;
in the &amp;lt;code&amp;gt;rmsd&amp;lt;/code&amp;gt; attribute.&lt;br /&gt;
&lt;br /&gt;
The algorithm used by &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; comes from \textit{Matrix&lt;br /&gt;
computations, 2nd ed. Golub, G. \&amp;amp; Van Loan (1989) and makes use&lt;br /&gt;
of singular value decomposition (this is implemented in the general&lt;br /&gt;
&amp;lt;code&amp;gt;Bio.SVDSuperimposer&amp;lt;/code&amp;gt; module).&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
sup = Superimposer()&lt;br /&gt;
# Specify the atom lists&lt;br /&gt;
# 'fixed' and 'moving' are lists of Atom objects&lt;br /&gt;
# The moving atoms will be put on the fixed atoms&lt;br /&gt;
sup.set_atoms(fixed, moving)&lt;br /&gt;
# Print rotation/translation/rmsd&lt;br /&gt;
print sup.rotran&lt;br /&gt;
print sup.rms &lt;br /&gt;
# Apply rotation/translation to the moving atoms&lt;br /&gt;
sup.apply(moving)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I superimpose two structures based on their active sites? ====&lt;br /&gt;
&lt;br /&gt;
Pretty easily. Use the active site atoms to calculate the rotation/translation&lt;br /&gt;
matrices (see above), and apply these to the whole molecule.&lt;br /&gt;
&lt;br /&gt;
==== Can I manipulate the atomic coordinates? ====&lt;br /&gt;
&lt;br /&gt;
Yes, using the &amp;lt;code&amp;gt;transform&amp;lt;/code&amp;gt; method of the &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object,&lt;br /&gt;
or directly using the &amp;lt;code&amp;gt;set_coord&amp;lt;/code&amp;gt; method.&lt;br /&gt;
&lt;br /&gt;
== Other Structural Bioinformatics modules ==&lt;br /&gt;
&lt;br /&gt;
==== Bio.SCOP ====&lt;br /&gt;
&lt;br /&gt;
See the main Biopython tutorial.&lt;br /&gt;
&lt;br /&gt;
==== Bio.FSSP ====&lt;br /&gt;
&lt;br /&gt;
No documentation available yet.&lt;br /&gt;
&lt;br /&gt;
== You haven't answered my question yet! ==&lt;br /&gt;
&lt;br /&gt;
Woah! It's late and I'm tired, and a glass of excellent ''Pedro Ximenez'' sherry is waiting for me. Just drop me a mail, and I'll answer&lt;br /&gt;
you in the morning (with a bit of luck...).&lt;br /&gt;
&lt;br /&gt;
== Contributors ==&lt;br /&gt;
&lt;br /&gt;
The main author/maintainer of Bio.PDB is:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;blockquote&amp;gt;&lt;br /&gt;
[mailto:thamelry@binf.ku.dk Thomas Hamelryck] &amp;lt;br&amp;gt;&lt;br /&gt;
Bioinformatics center &amp;lt;br&amp;gt;&lt;br /&gt;
Institute of Molecular Biology &amp;lt;br&amp;gt;&lt;br /&gt;
University of Copenhagen &amp;lt;br&amp;gt;&lt;br /&gt;
Universitetsparken 15, Bygning 10 &amp;lt;br&amp;gt;&lt;br /&gt;
DK-2100 København Ø &amp;lt;br&amp;gt;&lt;br /&gt;
Denmark&lt;br /&gt;
&amp;lt;/blockquote&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Kristian Rother donated code to interact with the PDB database, and to parse the PDB&lt;br /&gt;
header. Indraneel Majumdar sent in some bug reports and assisted in&lt;br /&gt;
coding the &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt; module. Many thanks to Brad Chapman,&lt;br /&gt;
Jeffrey Chang, Andrew Dalke and Iddo Friedberg for suggestions, comments,&lt;br /&gt;
help and/or biting criticism :-).&lt;br /&gt;
&lt;br /&gt;
== Can I contribute? ==&lt;br /&gt;
&lt;br /&gt;
Yes, yes, yes! Just send an e-mail to [mailto:thamelry@binf.ku.dk Thomas Hamelryck] or to [mailto:biopython-dev@biopython.org the Biopython developers] if you have something useful to contribute! Eternal fame awaits!&lt;/div&gt;</summary>
		<author><name>Mdehoon</name></author>	</entry>

	<entry>
		<id>http://www.biopython.org/wiki/The_Biopython_Structural_Bioinformatics_FAQ</id>
		<title>The Biopython Structural Bioinformatics FAQ</title>
		<link rel="alternate" type="text/html" href="http://www.biopython.org/wiki/The_Biopython_Structural_Bioinformatics_FAQ"/>
				<updated>2013-03-26T08:48:16Z</updated>
		
		<summary type="html">&lt;p&gt;Mdehoon: /* Contributors */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Introduction ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB is a biopython module that focuses on working with crystal&lt;br /&gt;
structures of biological macromolecules. This document gives a fairly&lt;br /&gt;
complete overview of Bio.PDB.&lt;br /&gt;
&lt;br /&gt;
== Bio.PDB's installation ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB is automatically installed as part of Biopython. Biopython&lt;br /&gt;
can be obtained from http://www.biopython.org. It runs on many&lt;br /&gt;
platforms (Linux/Unix, windows, Mac,...).&lt;br /&gt;
&lt;br /&gt;
== Who's using Bio.PDB? ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB was used in the construction of DISEMBL, a web server that&lt;br /&gt;
predicts disordered regions in proteins (http://dis.embl.de/),&lt;br /&gt;
and COLUMBA, a website that provides annotated protein structures&lt;br /&gt;
(http://www.columba-db.de/). Bio.PDB has also been used to&lt;br /&gt;
perform a large scale search for active sites similarities between&lt;br /&gt;
protein structures in the PDB (see [http://dx.doi.org/10.1002/prot.10338 ''Proteins'' '''51''': 96&amp;amp;ndash;108 (2003)]),&lt;br /&gt;
and to develop a new algorithm that identifies linear secondary structure elements (see [http://www.biomedcentral.com/1471-2105/6/202 ''BMC Bioinformatics'' '''6''': 202 (2005)]).&lt;br /&gt;
&lt;br /&gt;
Judging from requests for features and information, Bio.PDB is also&lt;br /&gt;
used by several LPCs (Large Pharmaceutical Companies :-).&lt;br /&gt;
&lt;br /&gt;
== Is there a Bio.PDB reference? ==&lt;br /&gt;
&lt;br /&gt;
Yes, and I'd appreciate it if you would refer to Bio.PDB in publications&lt;br /&gt;
if you make use of it. The reference is:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;blockquote&amp;gt;&lt;br /&gt;
Hamelryck, T., Manderick, B.: &amp;quot;PDB parser and structure class implemented in Python&amp;quot;. ''Bioinformatics'' '''19''': 2308&amp;amp;ndash;2310 (2003)&lt;br /&gt;
&amp;lt;/blockquote&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The article can be freely downloaded via the [http://www.binf.ku.dk/users/thamelry/references.html Bioinformatics journal website].&lt;br /&gt;
I welcome e-mails telling me what you are using Bio.PDB for. Feature requests are welcome too.&lt;br /&gt;
&lt;br /&gt;
== How well tested is Bio.PDB? ==&lt;br /&gt;
&lt;br /&gt;
Pretty well, actually. Bio.PDB has been extensively tested on nearly&lt;br /&gt;
5500 structures from the PDB - all structures seemed to be parsed&lt;br /&gt;
correctly. More details can be found in the Bio.PDB Bioinformatics&lt;br /&gt;
article. Bio.PDB has been used/is being used in many research projects&lt;br /&gt;
as a reliable tool. In fact, I'm using Bio.PDB almost daily for research&lt;br /&gt;
purposes and continue working on improving it and adding new features.&lt;br /&gt;
&lt;br /&gt;
== How fast is it? ==&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;PDBParser&amp;lt;/code&amp;gt; performance was tested on about 800 structures&lt;br /&gt;
(each belonging to a unique SCOP superfamily). This takes about 20&lt;br /&gt;
minutes, or on average 1.5 seconds per structure. Parsing the structure&lt;br /&gt;
of the large ribosomal subunit (1FKK), which contains about 64000&lt;br /&gt;
atoms, takes 10 seconds on a 1000 MHz PC. In short: it's more than&lt;br /&gt;
fast enough for many applications.&lt;br /&gt;
&lt;br /&gt;
== Why should I use Bio.PDB? ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB might be exactly what you want, and then again it might not.&lt;br /&gt;
If you are interested in data mining the PDB header, you might want&lt;br /&gt;
to look elsewhere because there is only limited support for this.&lt;br /&gt;
If you look for a powerful, complete data structure to access the&lt;br /&gt;
atomic data Bio.PDB is probably for you. &lt;br /&gt;
&lt;br /&gt;
== Usage ==&lt;br /&gt;
&lt;br /&gt;
=== General questions ===&lt;br /&gt;
&lt;br /&gt;
==== Importing Bio.PDB ====&lt;br /&gt;
&lt;br /&gt;
That's simple:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
from Bio.PDB import *&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Is there support for molecular graphics? ====&lt;br /&gt;
&lt;br /&gt;
Not directly, mostly since there are quite a few Python based/Python&lt;br /&gt;
aware solutions already, that can potentially be used with Bio.PDB.&lt;br /&gt;
My choice is Pymol, BTW (I've used this successfully with Bio.PDB,&lt;br /&gt;
and there will probably be specific PyMol modules in Bio.PDB soon/some&lt;br /&gt;
day). Python based/aware molecular graphics solutions include:&lt;br /&gt;
&lt;br /&gt;
* PyMol: http://pymol.sourceforge.net/&lt;br /&gt;
* Chimera: http://www.cgl.ucsf.edu/chimera/&lt;br /&gt;
* PMV: http://www.scripps.edu/~sanner/python/&lt;br /&gt;
* Coot: http://www.ysbl.york.ac.uk/~emsley/coot/&lt;br /&gt;
* CCP4mg: http://www.ysbl.york.ac.uk/~lizp/molgraphics.html&lt;br /&gt;
* mmLib: http://pymmlib.sourceforge.net/ &lt;br /&gt;
* VMD: http://www.ks.uiuc.edu/Research/vmd/&lt;br /&gt;
* MMTK: http://starship.python.net/crew/hinsen/MMTK/&lt;br /&gt;
&lt;br /&gt;
I'd be crazy to write another molecular graphics application (been&lt;br /&gt;
there - done that, actually :-).&lt;br /&gt;
&lt;br /&gt;
=== Input/output ===&lt;br /&gt;
&lt;br /&gt;
==== How do I create a structure object from a PDB file? ====&lt;br /&gt;
&lt;br /&gt;
First, create a &amp;lt;code&amp;gt;PDBParser&amp;lt;/code&amp;gt; object:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
parser = PDBParser()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then, create a structure object from a PDB file in the following way&lt;br /&gt;
(the PDB file in this case is called '1FAT.pdb', 'PHA-L' is a user&lt;br /&gt;
defined name for the structure):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
structure = parser.get_structure('PHA-L', '1FAT.pdb')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I create a structure object from an mmCIF file? ====&lt;br /&gt;
&lt;br /&gt;
Similarly to the case the case of PDB files, first create an &amp;lt;code&amp;gt;MMCIFParser&amp;lt;/code&amp;gt;&lt;br /&gt;
object:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
parser = MMCIFParser()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Then use this parser to create a structure object from the mmCIF file:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
structure = parser.get_structure('PHA-L', '1FAT.cif')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== ...and what about the new PDB XML format? ====&lt;br /&gt;
&lt;br /&gt;
That's not yet supported, but I'm definitely planning to support that&lt;br /&gt;
in the future (it's not a lot of work). Contact me if you need this,&lt;br /&gt;
it might encourage me :-).&lt;br /&gt;
&lt;br /&gt;
==== I'd like to have some more low level access to an mmCIF file... ====&lt;br /&gt;
&lt;br /&gt;
You got it. You can create a python dictionary that maps all mmCIF&lt;br /&gt;
tags in an mmCIF file to their values. If there are multiple values&lt;br /&gt;
(like in the case of tag &amp;lt;code&amp;gt;_atom_site.Cartn_y&amp;lt;/code&amp;gt;, which holds&lt;br /&gt;
the ''y'' coordinates of all atoms), the tag is mapped to a list of values.&lt;br /&gt;
The dictionary is created from the mmCIF file as follows:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
mmcif_dict = MMCIF2Dict('1FAT.cif')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example: get the solvent content from an mmCIF file:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
sc = mmcif_dict['_exptl_crystal.density_percent_sol']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example: get the list of the ''y'' coordinates of all atoms&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
y_list = mmcif_dict['_atom_site.Cartn_y']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Can I access the header information? ====&lt;br /&gt;
&lt;br /&gt;
Thanks to Christian Rother you can access some information from the&lt;br /&gt;
PDB header. Note however that many PDB files contain headers with&lt;br /&gt;
incomplete or erroneous information. Many of the errors have been&lt;br /&gt;
fixed in the equivalent mmCIF files.&lt;br /&gt;
'''Hence, if you are interested in the header information, it is a good idea to extract information from mmCIF files using the &amp;lt;code&amp;gt;MMCIF2Dict&amp;lt;/code&amp;gt; tool described above, instead of parsing the PDB header.''' &lt;br /&gt;
&lt;br /&gt;
Now that is clarified, let's return to parsing the PDB header. The&lt;br /&gt;
structure object has an attribute called &amp;lt;code&amp;gt;header&amp;lt;/code&amp;gt; which is&lt;br /&gt;
a python dictionary that maps header records to their values.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
resolution = structure.header['resolution']&lt;br /&gt;
keywords = structure.header['keywords']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The available keys are &amp;lt;code&amp;gt;name&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;head&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;deposition_date&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;release_date&amp;lt;/code&amp;gt;,&lt;br /&gt;
&amp;lt;code&amp;gt;structure_method&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;resolution&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;structure_reference&amp;lt;/code&amp;gt; (maps to&lt;br /&gt;
a list of references), &amp;lt;code&amp;gt;journal_reference&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;author&amp;lt;/code&amp;gt; and&lt;br /&gt;
&amp;lt;code&amp;gt;compound&amp;lt;/code&amp;gt; (maps to a dictionary with various information about&lt;br /&gt;
the crystallized compound).&lt;br /&gt;
&lt;br /&gt;
The dictionary can also be created without creating a &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt;&lt;br /&gt;
object, ie. directly from the PDB file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
handle = open(filename,'r')&lt;br /&gt;
header_dict = parse_pdb_header(handle)&lt;br /&gt;
handle.close()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Can I use Bio.PDB with NMR structures (ie. with more than one model)? ====&lt;br /&gt;
&lt;br /&gt;
Sure. Many PDB parsers assume that there is only one model, making&lt;br /&gt;
them all but useless for NMR structures. The design of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt;&lt;br /&gt;
object makes it easy to handle PDB files with more than one model&lt;br /&gt;
(see section [[#The Structure object]]). &lt;br /&gt;
&lt;br /&gt;
==== How do I download structures from the PDB? ====&lt;br /&gt;
&lt;br /&gt;
This can be done using the &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object, using the &amp;lt;code&amp;gt;retrieve_pdb_file&amp;lt;/code&amp;gt;&lt;br /&gt;
method. The argument for this method is the PDB identifier of the&lt;br /&gt;
structure.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
pdbl = PDBList()&lt;br /&gt;
pdbl.retrieve_pdb_file('1FAT')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; class can also be used as a command-line tool: &lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
python PDBList.py 1fat&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
The downloaded file will be called &amp;lt;code&amp;gt;pdb1fat.ent&amp;lt;/code&amp;gt; and stored&lt;br /&gt;
in the current working directory. Note that the &amp;lt;code&amp;gt;retrieve_pdb_file&amp;lt;/code&amp;gt;&lt;br /&gt;
method also has an optional argument &amp;lt;code&amp;gt;pdir&amp;lt;/code&amp;gt; that specifies&lt;br /&gt;
a specific directory in which to store the downloaded PDB files. &lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;retrieve_pdb_file&amp;lt;/code&amp;gt; method also has some options for specifying&lt;br /&gt;
the compression format used for the download, and the program used&lt;br /&gt;
for local decompression (default &amp;lt;code&amp;gt;.Z&amp;lt;/code&amp;gt; format and &amp;lt;code&amp;gt;gunzip&amp;lt;/code&amp;gt;).&lt;br /&gt;
In addition, the PDB ftp site can be specified upon creation of the&lt;br /&gt;
&amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object. By default, the server of the Worldwide Protein Data Bank (ftp://ftp.wwpdb.org/pub/pdb/data/structures/divided/pdb/)&lt;br /&gt;
is used. See the API documentation for more details. Thanks again&lt;br /&gt;
to Kristian Rother for donating this module.&lt;br /&gt;
&lt;br /&gt;
==== How do I download the entire PDB? ====&lt;br /&gt;
&lt;br /&gt;
The following commands will store all PDB files in the &amp;lt;code&amp;gt;/data/pdb&amp;lt;/code&amp;gt;&lt;br /&gt;
directory:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
python PDBList.py all /data/pdb&lt;br /&gt;
python PDBList.py all /data/pdb -d&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The API method for this is called &amp;lt;code&amp;gt;download_entire_pdb&amp;lt;/code&amp;gt;.&lt;br /&gt;
Adding the &amp;lt;code&amp;gt;-d&amp;lt;/code&amp;gt; option will store all files in the same directory.&lt;br /&gt;
Otherwise, they are sorted into PDB-style subdirectories according&lt;br /&gt;
to their PDB ID's. Depending on the traffic, a complete download will&lt;br /&gt;
take 2-4 days.&lt;br /&gt;
&lt;br /&gt;
==== How do I keep a local copy of the PDB up-to-date? ====&lt;br /&gt;
&lt;br /&gt;
This can also be done using the &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object. One simply&lt;br /&gt;
creates a &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object (specifying the directory where&lt;br /&gt;
the local copy of the PDB is present) and calls the &amp;lt;code&amp;gt;update_pdb&amp;lt;/code&amp;gt;&lt;br /&gt;
method:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
pl = PDBList(pdb='/data/pdb')&lt;br /&gt;
pl.update_pdb()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
One can of course make a weekly cronjob out of this to keep&lt;br /&gt;
the local copy automatically up-to-date. The PDB ftp site can also&lt;br /&gt;
be specified (see API documentation).&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; has some additional methods that can be of use. The&lt;br /&gt;
&amp;lt;code&amp;gt;get_all_obsolete&amp;lt;/code&amp;gt; method can be used to get a list of all&lt;br /&gt;
obsolete PDB entries. The &amp;lt;code&amp;gt;changed_this_week&amp;lt;/code&amp;gt; method can&lt;br /&gt;
be used to obtain the entries that were added, modified or obsoleted&lt;br /&gt;
during the current week. For more info on the possibilities of &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt;,&lt;br /&gt;
see the API documentation.&lt;br /&gt;
&lt;br /&gt;
==== What about all those buggy PDB files? ====&lt;br /&gt;
&lt;br /&gt;
It is well known that many PDB files contain semantic errors (I'm&lt;br /&gt;
not talking about the structures themselves know, but their representation&lt;br /&gt;
in PDB files). Bio.PDB tries to handle this in two ways. The PDBParser&lt;br /&gt;
object can behave in two ways: a restrictive way and a permissive&lt;br /&gt;
way (THIS IS NOW THE DEFAULT). The restrictive way used to be the&lt;br /&gt;
default, but people seemed to think that Bio.PDB 'crashed' due to&lt;br /&gt;
a bug (hah!), so I changed it. If you ever encounter a real bug, please&lt;br /&gt;
tell me immediately!&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Permissive parser&lt;br /&gt;
parser = PDBParser(PERMISSIVE=1)&lt;br /&gt;
parser = PDBParser()  # The same (default)&lt;br /&gt;
# Strict parser&lt;br /&gt;
strict_parser = PDBParser(PERMISSIVE=0)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
In the permissive state (DEFAULT), PDB files that obviously contain&lt;br /&gt;
errors are 'corrected' (i.e. some residues or atoms are left out).&lt;br /&gt;
These errors include:&lt;br /&gt;
* Multiple residues with the same identifier&lt;br /&gt;
* Multiple atoms with the same identifier (taking into account the altloc identifier)&lt;br /&gt;
These errors indicate real problems in the PDB file (for details see&lt;br /&gt;
the Bioinformatics article). In the restrictive state, PDB files with&lt;br /&gt;
errors cause an exception to occur. This is useful to find errors&lt;br /&gt;
in PDB files.&lt;br /&gt;
&lt;br /&gt;
Some errors however are automatically corrected. Normally each disordered&lt;br /&gt;
atom should have a non-blank altloc identifier. However, there are&lt;br /&gt;
many structures that do not follow this convention, and have a blank&lt;br /&gt;
and a non-blank identifier for two disordered positions of the same&lt;br /&gt;
atom. This is automatically interpreted in the right way.&lt;br /&gt;
&lt;br /&gt;
Sometimes a structure contains a list of residues belonging to chain&lt;br /&gt;
A, followed by residues belonging to chain B, and again followed by&lt;br /&gt;
residues belonging to chain A, i.e. the chains are 'broken'. This&lt;br /&gt;
is also correctly interpreted.&lt;br /&gt;
&lt;br /&gt;
==== Can I write PDB files? ====&lt;br /&gt;
&lt;br /&gt;
Use the PDBIO class for this. It's easy to write out specific parts&lt;br /&gt;
of a structure too, of course.&lt;br /&gt;
&lt;br /&gt;
Example: saving a structure&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
io = PDBIO()&lt;br /&gt;
io.set_structure(s)&lt;br /&gt;
io.save('out.pdb')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
If you want to write out a part of the structure, make use of the&lt;br /&gt;
&amp;lt;code&amp;gt;Select&amp;lt;/code&amp;gt; class (also in &amp;lt;code&amp;gt;PDBIO&amp;lt;/code&amp;gt;). Select has four methods:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
accept_model(model)&lt;br /&gt;
accept_chain(chain)&lt;br /&gt;
accept_residue(residue)&lt;br /&gt;
accept_atom(atom)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
By default, every method returns 1 (which means the model/chain/residue/atom&lt;br /&gt;
is included in the output). By subclassing &amp;lt;code&amp;gt;Select&amp;lt;/code&amp;gt; and returning&lt;br /&gt;
0 when appropriate you can exclude models, chains, etc. from the output.&lt;br /&gt;
Cumbersome maybe, but very powerful. The following code only writes&lt;br /&gt;
out glycine residues:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
class GlySelect(Select):&lt;br /&gt;
    def accept_residue(self, residue):&lt;br /&gt;
        if residue.get_name()=='GLY':&lt;br /&gt;
            return 1&lt;br /&gt;
        else:&lt;br /&gt;
            return 0&lt;br /&gt;
&lt;br /&gt;
io = PDBIO()&lt;br /&gt;
io.set_structure(s)&lt;br /&gt;
io.save('gly_only.pdb', GlySelect())&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
If this is all too complicated for you, the &amp;lt;code&amp;gt;Dice&amp;lt;/code&amp;gt; module contains&lt;br /&gt;
a handy &amp;lt;code&amp;gt;extract&amp;lt;/code&amp;gt; function that writes out all residues in&lt;br /&gt;
a chain between a start and end residue.&lt;br /&gt;
&lt;br /&gt;
==== Can I write mmCIF files? ====&lt;br /&gt;
&lt;br /&gt;
No, and I also don't have plans to add that functionality soon (or&lt;br /&gt;
ever - I don't need it at all, and it's a lot of work, plus no-one&lt;br /&gt;
has ever asked for it). People who want to add this can contact me.&lt;br /&gt;
&lt;br /&gt;
=== The Structure object ===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==== What's the overall layout of a Structure object? ====&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object follows the so-called '''SMCRA'''&lt;br /&gt;
(Structure/Model/Chain/Residue/Atom) architecture : &lt;br /&gt;
* A structure consists of models&lt;br /&gt;
* A model consists of chains&lt;br /&gt;
* A chain consists of residues&lt;br /&gt;
* A residue consists of atoms&lt;br /&gt;
This is the way many structural biologists/bioinformaticians think&lt;br /&gt;
about structure, and provides a simple but efficient way to deal with&lt;br /&gt;
structure. Additional stuff is essentially added when needed. A UML&lt;br /&gt;
diagram of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object (forgetting about the &amp;lt;code&amp;gt;Disordered&amp;lt;/code&amp;gt;&lt;br /&gt;
classes for now) is shown in the figure below.&lt;br /&gt;
&lt;br /&gt;
[[Image:Smcra.png|600px|left|frame|Diagram of SMCRA architecture of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object. Full lines with diamonds denote aggregation, full lines with arrows denote referencing, full lines with triangles denote inheritance and dashed lines with triangles denote interface realization.]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br clear=all&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I navigate through a Structure object? ====&lt;br /&gt;
&lt;br /&gt;
The following code iterates through all atoms of a structure:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
p = PDBParser()&lt;br /&gt;
structure = p.get_structure('X', 'pdb1fat.ent')&lt;br /&gt;
for model in structure:&lt;br /&gt;
    for chain in model:&lt;br /&gt;
        for residue in chain:&lt;br /&gt;
            for atom in residue:&lt;br /&gt;
                print atom&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
There are also some shortcuts:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Iterate over all atoms in a structure&lt;br /&gt;
for atom in structure.get_atoms():&lt;br /&gt;
    print atom&lt;br /&gt;
&lt;br /&gt;
# Iterate over all residues in a model&lt;br /&gt;
for residue in model.get_residues():&lt;br /&gt;
    print residue&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Structures, models, chains, residues and atoms are called '''Entities'''&lt;br /&gt;
in Biopython. You can always get a parent &amp;lt;code&amp;gt;Entity&amp;lt;/code&amp;gt; from a child&lt;br /&gt;
&amp;lt;code&amp;gt;Entity&amp;lt;/code&amp;gt;, e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
residue = atom.get_parent()&lt;br /&gt;
chain = residue.get_parent()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
You can also test whether an &amp;lt;code&amp;gt;Entity&amp;lt;/code&amp;gt; has a certain child using&lt;br /&gt;
the &amp;lt;code&amp;gt;has_id&amp;lt;/code&amp;gt; method.&lt;br /&gt;
&lt;br /&gt;
==== Can I do that a bit more conveniently? ====&lt;br /&gt;
&lt;br /&gt;
You can do things like:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
atoms = structure.get_atoms()&lt;br /&gt;
residues = structure.get_residues()&lt;br /&gt;
atoms = chain.get_atoms()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
You can also use the &amp;lt;code&amp;gt;Selection.unfold_entities&amp;lt;/code&amp;gt; function:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Get all residues from a structure&lt;br /&gt;
res_list = Selection.unfold_entities(structure, 'R')&lt;br /&gt;
# Get all atoms from a chain&lt;br /&gt;
atom_list = Selection.unfold_entities(chain, 'A')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Obviously, &amp;lt;code&amp;gt;A&amp;lt;/code&amp;gt;=atom, &amp;lt;code&amp;gt;R&amp;lt;/code&amp;gt;=residue, &amp;lt;code&amp;gt;C&amp;lt;/code&amp;gt;=chain, &amp;lt;code&amp;gt;M&amp;lt;/code&amp;gt;=model, &amp;lt;code&amp;gt;S&amp;lt;/code&amp;gt;=structure.&lt;br /&gt;
You can use this to go up in the hierarchy, e.g. to get a list of (unique) &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;Chain&amp;lt;/code&amp;gt; parents from a list of&lt;br /&gt;
&amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;s:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
residue_list = Selection.unfold_entities(atom_list, 'R')&lt;br /&gt;
chain_list = Selection.unfold_entities(atom_list, 'C')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
For more info, see the API documentation.&lt;br /&gt;
&lt;br /&gt;
==== How do I extract a specific &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;Chain&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;Model&amp;lt;/code&amp;gt; from a &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt;? ====&lt;br /&gt;
&lt;br /&gt;
Easy. Here are some examples:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
model = structure[0]&lt;br /&gt;
chain = model['A']&lt;br /&gt;
residue = chain[100]&lt;br /&gt;
atom = residue['CA']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Note that you can use a shortcut:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
atom = structure[0]['A'][100]['CA']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== What is a model id? ====&lt;br /&gt;
&lt;br /&gt;
The model id is an integer which denotes the rank of the model in&lt;br /&gt;
the PDB/mmCIF file. The model is starts at 0. Crystal structures generally&lt;br /&gt;
have only one model (with id 0), while NMR files usually have several&lt;br /&gt;
models.&lt;br /&gt;
&lt;br /&gt;
==== What is a chain id? ====&lt;br /&gt;
&lt;br /&gt;
The chain id is specified in the PDB/mmCIF file, and is a single character&lt;br /&gt;
(typically a letter). &lt;br /&gt;
&lt;br /&gt;
==== What is a residue id? ====&lt;br /&gt;
&lt;br /&gt;
This is a bit more complicated, due to the clumsy PDB format. A residue&lt;br /&gt;
id is a tuple with three elements:&lt;br /&gt;
* The '''hetero-flag''': this is &amp;lt;code&amp;gt;'H_'&amp;lt;/code&amp;gt; plus the name of the hetero-residue (e.g. &amp;lt;code&amp;gt;'H_GLC'&amp;lt;/code&amp;gt; in the case of a glucose molecule), or &amp;lt;code&amp;gt;'W'&amp;lt;/code&amp;gt; in the case of a water molecule.&lt;br /&gt;
* The '''sequence identifier''' in the chain, e.g. 100&lt;br /&gt;
* The '''insertion code''', e.g. &amp;lt;code&amp;gt;'A'&amp;lt;/code&amp;gt;. The insertion code is sometimes used to preserve a certain desirable residue numbering scheme. A Ser 80 insertion mutant (inserted e.g. between a Thr 80 and an Asn 81 residue) could e.g. have sequence identifiers and insertion codes as follows: Thr 80 A, Ser 80 B, Asn 81. In this way the residue numbering scheme stays in tune with that of the wild type structure.&lt;br /&gt;
&lt;br /&gt;
The id of the above glucose residue would thus be &amp;lt;code&amp;gt;('H_GLC', 100, 'A')&amp;lt;/code&amp;gt;. If the hetero-flag and insertion code are blank, the sequence&lt;br /&gt;
identifier alone can be used:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Full id&lt;br /&gt;
residue = chain[(' ', 100, ' ')]&lt;br /&gt;
# Shortcut id&lt;br /&gt;
residue = chain[100]&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
The reason for the hetero-flag is that many, many PDB files use the same sequence identifier for an amino acid and a hetero-residue or a water, which would create obvious problems if the hetero-flag was not used.&lt;br /&gt;
&lt;br /&gt;
==== What is an atom id? ====&lt;br /&gt;
&lt;br /&gt;
The atom id is simply the atom name (eg. &amp;lt;code&amp;gt;'CA'&amp;lt;/code&amp;gt;). In practice,&lt;br /&gt;
the atom name is created by stripping all spaces from the atom name&lt;br /&gt;
in the PDB file. &lt;br /&gt;
&lt;br /&gt;
However, in PDB files, a space can be part of an atom name. Often,&lt;br /&gt;
calcium atoms are called &amp;lt;code&amp;gt;'CA..'&amp;lt;/code&amp;gt; in order to distinguish them&lt;br /&gt;
from C&amp;amp;alpha; atoms (which are called &amp;lt;code&amp;gt;'.CA.'&amp;lt;/code&amp;gt;). In cases&lt;br /&gt;
were stripping the spaces would create problems (ie. two atoms called&lt;br /&gt;
&amp;lt;code&amp;gt;'CA'&amp;lt;/code&amp;gt; in the same residue) the spaces are kept.&lt;br /&gt;
&lt;br /&gt;
==== How is disorder handled? ====&lt;br /&gt;
&lt;br /&gt;
This is one of the strong points of Bio.PDB. It can handle both disordered&lt;br /&gt;
atoms and point mutations (ie. a Gly and an Ala residue in the same&lt;br /&gt;
position). &lt;br /&gt;
&lt;br /&gt;
Disorder should be dealt with from two points of view: the atom and&lt;br /&gt;
the residue points of view. In general, I have tried to encapsulate&lt;br /&gt;
all the complexity that arises from disorder. If you just want to&lt;br /&gt;
loop over all C&amp;amp;alpha; atoms, you do not care that some residues&lt;br /&gt;
have a disordered side chain. On the other hand it should also be&lt;br /&gt;
possible to represent disorder completely in the data structure. Therefore,&lt;br /&gt;
disordered atoms or residues are stored in special objects that behave&lt;br /&gt;
as if there is no disorder. This is done by only representing a subset&lt;br /&gt;
of the disordered atoms or residues. Which subset is picked (e.g.&lt;br /&gt;
which of the two disordered OG side chain atom positions of a Ser&lt;br /&gt;
residue is used) can be specified by the user.&lt;br /&gt;
&lt;br /&gt;
'''Disordered atom positions''' are represented by ordinary &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;&lt;br /&gt;
objects, but all &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects that represent the same physical&lt;br /&gt;
atom are stored in a &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; object (see section [[#The Structure object]]).&lt;br /&gt;
Each &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object in a &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; object can&lt;br /&gt;
be uniquely indexed using its altloc specifier. The &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt;&lt;br /&gt;
object forwards all uncaught method calls to the selected Atom object,&lt;br /&gt;
by default the one that represents the atom with the highest&lt;br /&gt;
occupancy. The user can of course change the selected &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;&lt;br /&gt;
object, making use of its altloc specifier. In this way atom disorder&lt;br /&gt;
is represented correctly without much additional complexity. In other&lt;br /&gt;
words, if you are not interested in atom disorder, you will not be&lt;br /&gt;
bothered by it.&lt;br /&gt;
&lt;br /&gt;
Each disordered atom has a characteristic altloc identifier. You can&lt;br /&gt;
specify that a &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; object should behave like&lt;br /&gt;
the &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object associated with a specific altloc identifier:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
atom.disordered_select('A')  # select altloc A atom&lt;br /&gt;
atom.disordered_select('B')  # select altloc B atom &lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
A special case arises when disorder is due to '''point mutations''',&lt;br /&gt;
i.e. when two or more point mutants of a polypeptide are present in&lt;br /&gt;
the crystal. An example of this can be found in PDB structure 1EN2.&lt;br /&gt;
&lt;br /&gt;
Since these residues belong to a different residue type (e.g. let's&lt;br /&gt;
say Ser 60 and Cys 60) they should not be stored in a single &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt;&lt;br /&gt;
object as in the common case. In this case, each residue is represented&lt;br /&gt;
by one &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object, and both &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; objects&lt;br /&gt;
are stored in a single &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt; object (see section [[#The Structure object]]).&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt; object forwards all uncaught&lt;br /&gt;
methods to the selected &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object (by default the last&lt;br /&gt;
&amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object added), and thus behaves like an ordinary&lt;br /&gt;
residue. Each &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object in a &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt;&lt;br /&gt;
object can be uniquely identified by its residue name. In the above&lt;br /&gt;
example, residue Ser 60 would have id 'SER' in the &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt;&lt;br /&gt;
object, while residue Cys 60 would have id 'CYS'. The user can select&lt;br /&gt;
the active &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object in a &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt;&lt;br /&gt;
object via this id.&lt;br /&gt;
&lt;br /&gt;
Example: suppose that a chain has a point mutation at position 10,&lt;br /&gt;
consisting of a Ser and a Cys residue. Make sure that residue 10 of&lt;br /&gt;
this chain behaves as the Cys residue.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
residue = chain[10]&lt;br /&gt;
residue.disordered_select('CYS')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
In addition, you can get a list of all &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects (ie.&lt;br /&gt;
all &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; objects are 'unpacked' to their individual&lt;br /&gt;
&amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects) using the &amp;lt;code&amp;gt;get_unpacked_list&amp;lt;/code&amp;gt; method&lt;br /&gt;
of a (&amp;lt;code&amp;gt;Disordered&amp;lt;/code&amp;gt;)&amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object.&lt;br /&gt;
&lt;br /&gt;
==== Can I sort residues in a chain somehow? ====&lt;br /&gt;
&lt;br /&gt;
Yes, kinda, but I'm waiting for a request for this feature to finish&lt;br /&gt;
it :-).&lt;br /&gt;
&lt;br /&gt;
==== How are ligands and solvent handled? ====&lt;br /&gt;
&lt;br /&gt;
See 'What is a residue id?'.&lt;br /&gt;
&lt;br /&gt;
==== What about B factors? ====&lt;br /&gt;
&lt;br /&gt;
Well, yes! Bio.PDB supports isotropic and anisotropic B factors, and&lt;br /&gt;
also deals with standard deviations of anisotropic B factor if present&lt;br /&gt;
(see the section [[#Analysis]]).&lt;br /&gt;
&lt;br /&gt;
==== What about standard deviation of atomic positions? ====&lt;br /&gt;
&lt;br /&gt;
Yup, supported. See the section [[#Analysis]].&lt;br /&gt;
&lt;br /&gt;
==== I think the SMCRA data structure is not flexible/sexy/whatever enough... ====&lt;br /&gt;
&lt;br /&gt;
Sure, sure. Everybody is always coming up with (mostly vaporware or&lt;br /&gt;
partly implemented) data structures that handle all possible situations&lt;br /&gt;
and are extensible in all thinkable (and unthinkable) ways. The prosaic&lt;br /&gt;
truth however is that 99.9\% of people using (and I mean really using!)&lt;br /&gt;
crystal structures think in terms of models, chains, residues and&lt;br /&gt;
atoms. The philosophy of Bio.PDB is to provide a reasonably fast,&lt;br /&gt;
clean, simple, but complete data structure to access structure data.&lt;br /&gt;
The proof of the pudding is in the eating.&lt;br /&gt;
&lt;br /&gt;
Moreover, it is quite easy to build more specialised data structures&lt;br /&gt;
on top of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; class (eg. there's a &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt;&lt;br /&gt;
class). On the other hand, the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object is built&lt;br /&gt;
using a Parser/Consumer approach (called &amp;lt;code&amp;gt;PDBParser&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;MMCIFParser&amp;lt;/code&amp;gt;&lt;br /&gt;
and &amp;lt;code&amp;gt;StructureBuilder&amp;lt;/code&amp;gt;, respectively). One can easily reuse&lt;br /&gt;
the PDB/mmCIF parsers by implementing a specialised &amp;lt;code&amp;gt;StructureBuilder&amp;lt;/code&amp;gt;&lt;br /&gt;
class. It is of course also trivial to add support for new file formats&lt;br /&gt;
by writing new parsers.&lt;br /&gt;
&lt;br /&gt;
=== Analysis ===&lt;br /&gt;
&lt;br /&gt;
==== How do I extract information from an &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object? ====&lt;br /&gt;
&lt;br /&gt;
Using the following methods:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
a.get_name()           # atom name (spaces stripped, e.g. 'CA')&lt;br /&gt;
a.get_id()             # id (equals atom name)&lt;br /&gt;
a.get_coord()          # atomic coordinates&lt;br /&gt;
a.get_vector()         # atomic coordinates as Vector object&lt;br /&gt;
a.get_bfactor()        # isotropic B factor&lt;br /&gt;
a.get_occupancy()      # occupancy&lt;br /&gt;
a.get_altloc()         # alternative location specifier&lt;br /&gt;
a.get_sigatm()         # std. dev. of atomic parameters&lt;br /&gt;
a.get_siguij()         # std. dev. of anisotropic B factor&lt;br /&gt;
a.get_anisou()         # anisotropic B factor&lt;br /&gt;
a.get_fullname()       # atom name (with spaces, e.g. '.CA.')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I extract information from a &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object? ====&lt;br /&gt;
&lt;br /&gt;
Using the following methods:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
r.get_resname()         # return the residue name (eg. 'GLY')&lt;br /&gt;
r.is_disordered()       # 1 if the residue has disordered atoms&lt;br /&gt;
r.get_segid()           # return the SEGID&lt;br /&gt;
r.has_id(name)          # test if a residue has a certain atom&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I measure distances? ====&lt;br /&gt;
&lt;br /&gt;
That's simple: the minus operator for atoms has been overloaded to&lt;br /&gt;
return the distance between two atoms. &lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Get some atoms&lt;br /&gt;
ca1 = residue1['CA']&lt;br /&gt;
ca2 = residue2['CA']&lt;br /&gt;
# Simply subtract the atoms to get their distance&lt;br /&gt;
distance = ca1-ca2&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I measure angles? ====&lt;br /&gt;
&lt;br /&gt;
This can easily be done via the vector representation of the atomic coordinates, and the &amp;lt;code&amp;gt;calc_angle&amp;lt;/code&amp;gt; function from the &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; module:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
vector1 = atom1.get_vector()&lt;br /&gt;
vector2 = atom2.get_vector()&lt;br /&gt;
vector3 = atom3.get_vector()&lt;br /&gt;
angle = calc_angle(vector1, vector2, vector3)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I measure torsion angles? ====&lt;br /&gt;
&lt;br /&gt;
Again, this can easily be done via the vector representation of the&lt;br /&gt;
atomic coordinates, this time using the &amp;lt;code&amp;gt;calc_dihedral&amp;lt;/code&amp;gt; function&lt;br /&gt;
from the &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; module:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
vector1 = atom1.get_vector()&lt;br /&gt;
vector2 = atom2.get_vector()&lt;br /&gt;
vector3 = atom3.get_vector()&lt;br /&gt;
vector4 = atom4.get_vector()&lt;br /&gt;
angle = calc_dihedral(vector1, vector2, vector3, vector4)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I determine atom-atom contacts? ====&lt;br /&gt;
&lt;br /&gt;
Use &amp;lt;code&amp;gt;NeighborSearch&amp;lt;/code&amp;gt;. This uses a KD tree data structure coded&lt;br /&gt;
in C behind the screens, so it's pretty darn fast (see &amp;lt;code&amp;gt;Bio.KDTree&amp;lt;/code&amp;gt;).&lt;br /&gt;
&lt;br /&gt;
==== How do I extract polypeptides from a &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object? ====&lt;br /&gt;
&lt;br /&gt;
Use &amp;lt;code&amp;gt;PolypeptideBuilder&amp;lt;/code&amp;gt;. You can use the resulting &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt;&lt;br /&gt;
object to get the sequence as a &amp;lt;code&amp;gt;Seq&amp;lt;/code&amp;gt; object or to get a list&lt;br /&gt;
of C&amp;amp;alpha; atoms as well. Polypeptides can be built using a C-N&lt;br /&gt;
or a C&amp;amp;alpha;-C&amp;amp;alpha; distance criterion.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Using C-N &lt;br /&gt;
ppb=PPBuilder()&lt;br /&gt;
for pp in ppb.build_peptides(structure): &lt;br /&gt;
    print pp.get_sequence()&lt;br /&gt;
&lt;br /&gt;
# Using CA-CA&lt;br /&gt;
ppb=CaPPBuilder()&lt;br /&gt;
for pp in ppb.build_peptides(structure): &lt;br /&gt;
    print pp.get_sequence()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that in the above case only model 0 of the structure is considered&lt;br /&gt;
by &amp;lt;code&amp;gt;PolypeptideBuilder&amp;lt;/code&amp;gt;. However, it is possible to use &amp;lt;code&amp;gt;PolypeptideBuilder&amp;lt;/code&amp;gt;&lt;br /&gt;
to build &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt; objects from &amp;lt;code&amp;gt;Model&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;Chain&amp;lt;/code&amp;gt;&lt;br /&gt;
objects as well.&lt;br /&gt;
&lt;br /&gt;
==== How do I get the sequence of a structure? ====&lt;br /&gt;
&lt;br /&gt;
The first thing to do is to extract all polypeptides from the structure&lt;br /&gt;
(see previous entry). The sequence of each polypeptide can then easily&lt;br /&gt;
be obtained from the &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt; objects. The sequence is&lt;br /&gt;
represented as a Biopython &amp;lt;code&amp;gt;Seq&amp;lt;/code&amp;gt; object, and its alphabet is&lt;br /&gt;
defined by a &amp;lt;code&amp;gt;ProteinAlphabet&amp;lt;/code&amp;gt; object.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; seq = polypeptide.get_sequence()&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print seq&lt;br /&gt;
Seq('SNVVE...', &amp;lt;class Bio.Alphabet.ProteinAlphabet&amp;gt;)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I determine secondary structure? ====&lt;br /&gt;
&lt;br /&gt;
For this functionality, you need to install DSSP (and obtain a license&lt;br /&gt;
for it - free for academic use, see http://www.cmbi.kun.nl/gv/dssp/).&lt;br /&gt;
Then use the &amp;lt;code&amp;gt;DSSP&amp;lt;/code&amp;gt; class, which maps &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; objects&lt;br /&gt;
to their secondary structure (and accessible surface area). The DSSP&lt;br /&gt;
codes are listed in Table \ref{cap:DSSP-codes. Note that DSSP (the&lt;br /&gt;
program, and thus by consequence the class) cannot handle multiple&lt;br /&gt;
models!&lt;br /&gt;
&lt;br /&gt;
%&lt;br /&gt;
\begin{table&lt;br /&gt;
&lt;br /&gt;
==== \begin{tabular{|c|c|&lt;br /&gt;
\hline &lt;br /&gt;
Code&amp;amp;&lt;br /&gt;
Secondary structure\tabularnewline&lt;br /&gt;
\hline&lt;br /&gt;
\hline &lt;br /&gt;
H&amp;amp;&lt;br /&gt;
&amp;amp;alpha;-helix\tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
B&amp;amp;&lt;br /&gt;
Isolated &amp;amp;beta;-bridge residue\tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
E&amp;amp;&lt;br /&gt;
Strand \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
G&amp;amp;&lt;br /&gt;
3-10 helix \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
I&amp;amp;&lt;br /&gt;
$\Pi$-helix \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
T&amp;amp;&lt;br /&gt;
Turn\tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
S&amp;amp;&lt;br /&gt;
Bend \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
-&amp;amp;&lt;br /&gt;
Other\tabularnewline&lt;br /&gt;
\hline&lt;br /&gt;
\end{tabular&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
\caption{\label{cap:DSSP-codesDSSP codes in Bio.PDB.&lt;br /&gt;
\end{table&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==== How do I calculate the accessible surface area of a residue? ====&lt;br /&gt;
&lt;br /&gt;
Use the &amp;lt;code&amp;gt;DSSP&amp;lt;/code&amp;gt; class (see also previous entry). But see also&lt;br /&gt;
next entry.&lt;br /&gt;
&lt;br /&gt;
==== How do I calculate residue depth? ====&lt;br /&gt;
&lt;br /&gt;
Residue depth is the average distance of a residue's atoms from the&lt;br /&gt;
solvent accessible surface. It's a fairly new and very powerful parameterization&lt;br /&gt;
of solvent accessibility. For this functionality, you need to install&lt;br /&gt;
Michel Sanner's MSMS program (http://www.scripps.edu/pub/olson-web/people/sanner/html/msms_home.html).&lt;br /&gt;
Then use the &amp;lt;code&amp;gt;ResidueDepth&amp;lt;/code&amp;gt; class. This class behaves as a&lt;br /&gt;
dictionary which maps &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; objects to corresponding (residue&lt;br /&gt;
depth, C&amp;amp;alpha; depth) tuples. The C&amp;amp;alpha; depth is the distance&lt;br /&gt;
of a residue's C&amp;amp;alpha; atom to the solvent accessible surface. &lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
model = structure[0]&lt;br /&gt;
rd = ResidueDepth(model, pdb_file)&lt;br /&gt;
residue_depth, ca_depth = rd[some_residue]&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
You can also get access to the molecular surface itself (via the &amp;lt;code&amp;gt;get_surface&amp;lt;/code&amp;gt;&lt;br /&gt;
function), in the form of a Numeric python array with the surface points.&lt;br /&gt;
&lt;br /&gt;
==== How do I calculate Half Sphere Exposure? ====&lt;br /&gt;
&lt;br /&gt;
Half Sphere Exposure (HSE) is a new, 2D measure of solvent exposure.&lt;br /&gt;
Basically, it counts the number of C&amp;amp;alpha; atoms around a residue&lt;br /&gt;
in the direction of its side chain, and in the opposite direction&lt;br /&gt;
(within a radius of 13 Å). Despite its simplicity, it outperforms&lt;br /&gt;
many other measures of solvent exposure. An article describing this&lt;br /&gt;
novel 2D measure has been submitted.&lt;br /&gt;
&lt;br /&gt;
HSE comes in two flavors: HSE&amp;amp;alpha; and HSE&amp;amp;beta;. The former&lt;br /&gt;
only uses the C&amp;amp;alpha; atom positions, while the latter uses the&lt;br /&gt;
C&amp;amp;alpha; and C&amp;amp;beta; atom positions. The HSE measure is calculated&lt;br /&gt;
by the &amp;lt;code&amp;gt;HSExposure&amp;lt;/code&amp;gt; class, which can also calculate the contact&lt;br /&gt;
number. The latter class has methods which return dictionaries that&lt;br /&gt;
map a &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object to its corresponding HSE&amp;amp;alpha;, HSE&amp;amp;beta;&lt;br /&gt;
and contact number values.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
model = structure[0]&lt;br /&gt;
hse = HSExposure()&lt;br /&gt;
# Calculate HSEalpha&lt;br /&gt;
exp_ca = hse.calc_hs_exposure(model, option='CA3')&lt;br /&gt;
# Calculate HSEbeta&lt;br /&gt;
exp_cb = hse.calc_hs_exposure(model, option='CB')&lt;br /&gt;
# Calculate classical coordination number&lt;br /&gt;
exp_fs = hse.calc_fs_exposure(model)&lt;br /&gt;
# Print HSEalpha for a residue&lt;br /&gt;
print exp_ca[some_residue]&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I map the residues of two related structures onto each other? ====&lt;br /&gt;
&lt;br /&gt;
First, create an alignment file in FASTA format, then use the &amp;lt;code&amp;gt;StructureAlignment&amp;lt;/code&amp;gt;&lt;br /&gt;
class. This class can also be used for alignments with more than two&lt;br /&gt;
structures.&lt;br /&gt;
&lt;br /&gt;
==== How do I test if a Residue object is an amino acid? ====&lt;br /&gt;
&lt;br /&gt;
Use &amp;lt;code&amp;gt;is_aa(residue)&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
==== Can I do vector operations on atomic coordinates? ====&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects return a &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; object representation&lt;br /&gt;
of the coordinates with the &amp;lt;code&amp;gt;get_vector&amp;lt;/code&amp;gt; method. &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt;&lt;br /&gt;
implements the full set of 3D vector operations, matrix multiplication&lt;br /&gt;
(left and right) and some advanced rotation-related operations as&lt;br /&gt;
well. See also next question.&lt;br /&gt;
&lt;br /&gt;
==== How do I put a virtual C&amp;amp;beta; on a Gly residue? ====&lt;br /&gt;
&lt;br /&gt;
OK, I admit, this example is only present to show off the possibilities&lt;br /&gt;
of Bio.PDB's &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; module (though this code is actually&lt;br /&gt;
used in the &amp;lt;code&amp;gt;HSExposure&amp;lt;/code&amp;gt; module, which contains a novel way&lt;br /&gt;
to parametrize residue exposure - publication underway). Suppose that&lt;br /&gt;
you would like to find the position of a Gly residue's C&amp;amp;beta; atom,&lt;br /&gt;
if it had one. How would you do that? Well, rotating the N atom of&lt;br /&gt;
the Gly residue along the C&amp;amp;alpha;-C bond over -120 degrees roughly&lt;br /&gt;
puts it in the position of a virtual C&amp;amp;beta; atom. Here's how to&lt;br /&gt;
do it, making use of the &amp;lt;code&amp;gt;rotaxis&amp;lt;/code&amp;gt; method (which can be used&lt;br /&gt;
to construct a rotation around a certain axis) of the &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt;&lt;br /&gt;
module:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# get atom coordinates as vectors&lt;br /&gt;
n = residue['N'].get_vector() &lt;br /&gt;
c = residue['C'].get_vector() &lt;br /&gt;
ca = residue['CA'].get_vector()&lt;br /&gt;
# center at origin&lt;br /&gt;
n = n - ca&lt;br /&gt;
c = c - ca&lt;br /&gt;
# find rotation matrix that rotates n -120 degrees along the ca-c vector&lt;br /&gt;
rot = rotaxis(-pi{*120.0/180.0, c)&lt;br /&gt;
# apply rotation to ca-n vector&lt;br /&gt;
cb_at_origin = n.left_multiply(rot)&lt;br /&gt;
# put on top of ca atom&lt;br /&gt;
cb = cb_at_origin + ca&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
This example shows that it's possible to do some quite nontrivial&lt;br /&gt;
vector operations on atomic data, which can be quite useful. In addition&lt;br /&gt;
to all the usual vector operations (cross (use &amp;lt;code&amp;gt;**&amp;lt;/code&amp;gt;), and&lt;br /&gt;
dot (use &amp;lt;code&amp;gt;*&amp;lt;/code&amp;gt;) product, angle, norm, etc.) and the above mentioned&lt;br /&gt;
&amp;lt;code&amp;gt;rotaxis function, the &amp;lt;code&amp;gt;Vector module also has methods&lt;br /&gt;
to rotate (&amp;lt;code&amp;gt;rotmat&amp;lt;/code&amp;gt;) or reflect (&amp;lt;code&amp;gt;refmat&amp;lt;/code&amp;gt;) one vector&lt;br /&gt;
on top of another.&lt;br /&gt;
&lt;br /&gt;
=== Manipulating the structure ===&lt;br /&gt;
&lt;br /&gt;
==== How do I superimpose two structures? ====&lt;br /&gt;
&lt;br /&gt;
Surprisingly, this is done using the &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; object.&lt;br /&gt;
This object calculates the rotation and translation matrix that rotates&lt;br /&gt;
two lists of atoms on top of each other in such a way that their RMSD&lt;br /&gt;
is minimized. Of course, the two lists need to contain the same amount&lt;br /&gt;
of atoms. The &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; object can also apply the rotation/translation&lt;br /&gt;
to a list of atoms. The rotation and translation are stored as a tuple&lt;br /&gt;
in the &amp;lt;code&amp;gt;rotran&amp;lt;/code&amp;gt; attribute of the &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; object&lt;br /&gt;
(note that the rotation is right multiplying!). The RMSD is stored&lt;br /&gt;
in the &amp;lt;code&amp;gt;rmsd&amp;lt;/code&amp;gt; attribute.&lt;br /&gt;
&lt;br /&gt;
The algorithm used by &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; comes from \textit{Matrix&lt;br /&gt;
computations, 2nd ed. Golub, G. \&amp;amp; Van Loan (1989) and makes use&lt;br /&gt;
of singular value decomposition (this is implemented in the general&lt;br /&gt;
&amp;lt;code&amp;gt;Bio.SVDSuperimposer&amp;lt;/code&amp;gt; module).&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
sup = Superimposer()&lt;br /&gt;
# Specify the atom lists&lt;br /&gt;
# 'fixed' and 'moving' are lists of Atom objects&lt;br /&gt;
# The moving atoms will be put on the fixed atoms&lt;br /&gt;
sup.set_atoms(fixed, moving)&lt;br /&gt;
# Print rotation/translation/rmsd&lt;br /&gt;
print sup.rotran&lt;br /&gt;
print sup.rms &lt;br /&gt;
# Apply rotation/translation to the moving atoms&lt;br /&gt;
sup.apply(moving)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I superimpose two structures based on their active sites? ====&lt;br /&gt;
&lt;br /&gt;
Pretty easily. Use the active site atoms to calculate the rotation/translation&lt;br /&gt;
matrices (see above), and apply these to the whole molecule.&lt;br /&gt;
&lt;br /&gt;
==== Can I manipulate the atomic coordinates? ====&lt;br /&gt;
&lt;br /&gt;
Yes, using the &amp;lt;code&amp;gt;transform&amp;lt;/code&amp;gt; method of the &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object,&lt;br /&gt;
or directly using the &amp;lt;code&amp;gt;set_coord&amp;lt;/code&amp;gt; method.&lt;br /&gt;
&lt;br /&gt;
== Other Structural Bioinformatics modules ==&lt;br /&gt;
&lt;br /&gt;
==== Bio.SCOP ====&lt;br /&gt;
&lt;br /&gt;
See the main Biopython tutorial.&lt;br /&gt;
&lt;br /&gt;
==== Bio.FSSP ====&lt;br /&gt;
&lt;br /&gt;
No documentation available yet.&lt;br /&gt;
&lt;br /&gt;
== You haven't answered my question yet! ==&lt;br /&gt;
&lt;br /&gt;
Woah! It's late and I'm tired, and a glass of excellent ''Pedro Ximenez'' sherry is waiting for me. Just drop me a mail, and I'll answer&lt;br /&gt;
you in the morning (with a bit of luck...).&lt;br /&gt;
&lt;br /&gt;
== Contributors ==&lt;br /&gt;
&lt;br /&gt;
The main author/maintainer of Bio.PDB is:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;blockquote&amp;gt;&lt;br /&gt;
[mailto:thamelry@binf.ku.dk Thomas Hamelryck] &amp;lt;br&amp;gt;&lt;br /&gt;
Bioinformatics center &amp;lt;br&amp;gt;&lt;br /&gt;
Institute of Molecular Biology &amp;lt;br&amp;gt;&lt;br /&gt;
University of Copenhagen &amp;lt;br&amp;gt;&lt;br /&gt;
Universitetsparken 15, Bygning 10 &amp;lt;br&amp;gt;&lt;br /&gt;
DK-2100 København Ø &amp;lt;br&amp;gt;&lt;br /&gt;
Denmark&lt;br /&gt;
&amp;lt;/blockquote&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Kristian Rother donated code to interact with the PDB database, and to parse the PDB&lt;br /&gt;
header. Indraneel Majumdar sent in some bug reports and assisted in&lt;br /&gt;
coding the &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt; module. Many thanks to Brad Chapman,&lt;br /&gt;
Jeffrey Chang, Andrew Dalke and Iddo Friedberg for suggestions, comments,&lt;br /&gt;
help and/or biting criticism :-).&lt;br /&gt;
&lt;br /&gt;
== Can I contribute? ==&lt;br /&gt;
&lt;br /&gt;
Yes, yes, yes! Just send an e-mail to [mailto:thamelry@binf.ku.dk Thomas Hamelryck] or to [mailto:biopython-dev@biopython.org the Biopython developers] if you have something useful to contribute! Eternal fame awaits!&lt;/div&gt;</summary>
		<author><name>Mdehoon</name></author>	</entry>

	<entry>
		<id>http://www.biopython.org/wiki/The_Biopython_Structural_Bioinformatics_FAQ</id>
		<title>The Biopython Structural Bioinformatics FAQ</title>
		<link rel="alternate" type="text/html" href="http://www.biopython.org/wiki/The_Biopython_Structural_Bioinformatics_FAQ"/>
				<updated>2013-03-26T08:45:09Z</updated>
		
		<summary type="html">&lt;p&gt;Mdehoon: /* Is there a Bio.PDB reference? */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Introduction ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB is a biopython module that focuses on working with crystal&lt;br /&gt;
structures of biological macromolecules. This document gives a fairly&lt;br /&gt;
complete overview of Bio.PDB.&lt;br /&gt;
&lt;br /&gt;
== Bio.PDB's installation ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB is automatically installed as part of Biopython. Biopython&lt;br /&gt;
can be obtained from http://www.biopython.org. It runs on many&lt;br /&gt;
platforms (Linux/Unix, windows, Mac,...).&lt;br /&gt;
&lt;br /&gt;
== Who's using Bio.PDB? ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB was used in the construction of DISEMBL, a web server that&lt;br /&gt;
predicts disordered regions in proteins (http://dis.embl.de/),&lt;br /&gt;
and COLUMBA, a website that provides annotated protein structures&lt;br /&gt;
(http://www.columba-db.de/). Bio.PDB has also been used to&lt;br /&gt;
perform a large scale search for active sites similarities between&lt;br /&gt;
protein structures in the PDB (see [http://dx.doi.org/10.1002/prot.10338 ''Proteins'' '''51''': 96&amp;amp;ndash;108 (2003)]),&lt;br /&gt;
and to develop a new algorithm that identifies linear secondary structure elements (see [http://www.biomedcentral.com/1471-2105/6/202 ''BMC Bioinformatics'' '''6''': 202 (2005)]).&lt;br /&gt;
&lt;br /&gt;
Judging from requests for features and information, Bio.PDB is also&lt;br /&gt;
used by several LPCs (Large Pharmaceutical Companies :-).&lt;br /&gt;
&lt;br /&gt;
== Is there a Bio.PDB reference? ==&lt;br /&gt;
&lt;br /&gt;
Yes, and I'd appreciate it if you would refer to Bio.PDB in publications&lt;br /&gt;
if you make use of it. The reference is:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;blockquote&amp;gt;&lt;br /&gt;
Hamelryck, T., Manderick, B.: &amp;quot;PDB parser and structure class implemented in Python&amp;quot;. ''Bioinformatics'' '''19''': 2308&amp;amp;ndash;2310 (2003)&lt;br /&gt;
&amp;lt;/blockquote&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The article can be freely downloaded via the [http://www.binf.ku.dk/users/thamelry/references.html Bioinformatics journal website].&lt;br /&gt;
I welcome e-mails telling me what you are using Bio.PDB for. Feature requests are welcome too.&lt;br /&gt;
&lt;br /&gt;
== How well tested is Bio.PDB? ==&lt;br /&gt;
&lt;br /&gt;
Pretty well, actually. Bio.PDB has been extensively tested on nearly&lt;br /&gt;
5500 structures from the PDB - all structures seemed to be parsed&lt;br /&gt;
correctly. More details can be found in the Bio.PDB Bioinformatics&lt;br /&gt;
article. Bio.PDB has been used/is being used in many research projects&lt;br /&gt;
as a reliable tool. In fact, I'm using Bio.PDB almost daily for research&lt;br /&gt;
purposes and continue working on improving it and adding new features.&lt;br /&gt;
&lt;br /&gt;
== How fast is it? ==&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;PDBParser&amp;lt;/code&amp;gt; performance was tested on about 800 structures&lt;br /&gt;
(each belonging to a unique SCOP superfamily). This takes about 20&lt;br /&gt;
minutes, or on average 1.5 seconds per structure. Parsing the structure&lt;br /&gt;
of the large ribosomal subunit (1FKK), which contains about 64000&lt;br /&gt;
atoms, takes 10 seconds on a 1000 MHz PC. In short: it's more than&lt;br /&gt;
fast enough for many applications.&lt;br /&gt;
&lt;br /&gt;
== Why should I use Bio.PDB? ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB might be exactly what you want, and then again it might not.&lt;br /&gt;
If you are interested in data mining the PDB header, you might want&lt;br /&gt;
to look elsewhere because there is only limited support for this.&lt;br /&gt;
If you look for a powerful, complete data structure to access the&lt;br /&gt;
atomic data Bio.PDB is probably for you. &lt;br /&gt;
&lt;br /&gt;
== Usage ==&lt;br /&gt;
&lt;br /&gt;
=== General questions ===&lt;br /&gt;
&lt;br /&gt;
==== Importing Bio.PDB ====&lt;br /&gt;
&lt;br /&gt;
That's simple:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
from Bio.PDB import *&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Is there support for molecular graphics? ====&lt;br /&gt;
&lt;br /&gt;
Not directly, mostly since there are quite a few Python based/Python&lt;br /&gt;
aware solutions already, that can potentially be used with Bio.PDB.&lt;br /&gt;
My choice is Pymol, BTW (I've used this successfully with Bio.PDB,&lt;br /&gt;
and there will probably be specific PyMol modules in Bio.PDB soon/some&lt;br /&gt;
day). Python based/aware molecular graphics solutions include:&lt;br /&gt;
&lt;br /&gt;
* PyMol: http://pymol.sourceforge.net/&lt;br /&gt;
* Chimera: http://www.cgl.ucsf.edu/chimera/&lt;br /&gt;
* PMV: http://www.scripps.edu/~sanner/python/&lt;br /&gt;
* Coot: http://www.ysbl.york.ac.uk/~emsley/coot/&lt;br /&gt;
* CCP4mg: http://www.ysbl.york.ac.uk/~lizp/molgraphics.html&lt;br /&gt;
* mmLib: http://pymmlib.sourceforge.net/ &lt;br /&gt;
* VMD: http://www.ks.uiuc.edu/Research/vmd/&lt;br /&gt;
* MMTK: http://starship.python.net/crew/hinsen/MMTK/&lt;br /&gt;
&lt;br /&gt;
I'd be crazy to write another molecular graphics application (been&lt;br /&gt;
there - done that, actually :-).&lt;br /&gt;
&lt;br /&gt;
=== Input/output ===&lt;br /&gt;
&lt;br /&gt;
==== How do I create a structure object from a PDB file? ====&lt;br /&gt;
&lt;br /&gt;
First, create a &amp;lt;code&amp;gt;PDBParser&amp;lt;/code&amp;gt; object:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
parser = PDBParser()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then, create a structure object from a PDB file in the following way&lt;br /&gt;
(the PDB file in this case is called '1FAT.pdb', 'PHA-L' is a user&lt;br /&gt;
defined name for the structure):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
structure = parser.get_structure('PHA-L', '1FAT.pdb')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I create a structure object from an mmCIF file? ====&lt;br /&gt;
&lt;br /&gt;
Similarly to the case the case of PDB files, first create an &amp;lt;code&amp;gt;MMCIFParser&amp;lt;/code&amp;gt;&lt;br /&gt;
object:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
parser = MMCIFParser()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Then use this parser to create a structure object from the mmCIF file:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
structure = parser.get_structure('PHA-L', '1FAT.cif')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== ...and what about the new PDB XML format? ====&lt;br /&gt;
&lt;br /&gt;
That's not yet supported, but I'm definitely planning to support that&lt;br /&gt;
in the future (it's not a lot of work). Contact me if you need this,&lt;br /&gt;
it might encourage me :-).&lt;br /&gt;
&lt;br /&gt;
==== I'd like to have some more low level access to an mmCIF file... ====&lt;br /&gt;
&lt;br /&gt;
You got it. You can create a python dictionary that maps all mmCIF&lt;br /&gt;
tags in an mmCIF file to their values. If there are multiple values&lt;br /&gt;
(like in the case of tag &amp;lt;code&amp;gt;_atom_site.Cartn_y&amp;lt;/code&amp;gt;, which holds&lt;br /&gt;
the ''y'' coordinates of all atoms), the tag is mapped to a list of values.&lt;br /&gt;
The dictionary is created from the mmCIF file as follows:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
mmcif_dict = MMCIF2Dict('1FAT.cif')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example: get the solvent content from an mmCIF file:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
sc = mmcif_dict['_exptl_crystal.density_percent_sol']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example: get the list of the ''y'' coordinates of all atoms&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
y_list = mmcif_dict['_atom_site.Cartn_y']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Can I access the header information? ====&lt;br /&gt;
&lt;br /&gt;
Thanks to Christian Rother you can access some information from the&lt;br /&gt;
PDB header. Note however that many PDB files contain headers with&lt;br /&gt;
incomplete or erroneous information. Many of the errors have been&lt;br /&gt;
fixed in the equivalent mmCIF files.&lt;br /&gt;
'''Hence, if you are interested in the header information, it is a good idea to extract information from mmCIF files using the &amp;lt;code&amp;gt;MMCIF2Dict&amp;lt;/code&amp;gt; tool described above, instead of parsing the PDB header.''' &lt;br /&gt;
&lt;br /&gt;
Now that is clarified, let's return to parsing the PDB header. The&lt;br /&gt;
structure object has an attribute called &amp;lt;code&amp;gt;header&amp;lt;/code&amp;gt; which is&lt;br /&gt;
a python dictionary that maps header records to their values.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
resolution = structure.header['resolution']&lt;br /&gt;
keywords = structure.header['keywords']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The available keys are &amp;lt;code&amp;gt;name&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;head&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;deposition_date&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;release_date&amp;lt;/code&amp;gt;,&lt;br /&gt;
&amp;lt;code&amp;gt;structure_method&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;resolution&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;structure_reference&amp;lt;/code&amp;gt; (maps to&lt;br /&gt;
a list of references), &amp;lt;code&amp;gt;journal_reference&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;author&amp;lt;/code&amp;gt; and&lt;br /&gt;
&amp;lt;code&amp;gt;compound&amp;lt;/code&amp;gt; (maps to a dictionary with various information about&lt;br /&gt;
the crystallized compound).&lt;br /&gt;
&lt;br /&gt;
The dictionary can also be created without creating a &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt;&lt;br /&gt;
object, ie. directly from the PDB file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
handle = open(filename,'r')&lt;br /&gt;
header_dict = parse_pdb_header(handle)&lt;br /&gt;
handle.close()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Can I use Bio.PDB with NMR structures (ie. with more than one model)? ====&lt;br /&gt;
&lt;br /&gt;
Sure. Many PDB parsers assume that there is only one model, making&lt;br /&gt;
them all but useless for NMR structures. The design of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt;&lt;br /&gt;
object makes it easy to handle PDB files with more than one model&lt;br /&gt;
(see section [[#The Structure object]]). &lt;br /&gt;
&lt;br /&gt;
==== How do I download structures from the PDB? ====&lt;br /&gt;
&lt;br /&gt;
This can be done using the &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object, using the &amp;lt;code&amp;gt;retrieve_pdb_file&amp;lt;/code&amp;gt;&lt;br /&gt;
method. The argument for this method is the PDB identifier of the&lt;br /&gt;
structure.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
pdbl = PDBList()&lt;br /&gt;
pdbl.retrieve_pdb_file('1FAT')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; class can also be used as a command-line tool: &lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
python PDBList.py 1fat&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
The downloaded file will be called &amp;lt;code&amp;gt;pdb1fat.ent&amp;lt;/code&amp;gt; and stored&lt;br /&gt;
in the current working directory. Note that the &amp;lt;code&amp;gt;retrieve_pdb_file&amp;lt;/code&amp;gt;&lt;br /&gt;
method also has an optional argument &amp;lt;code&amp;gt;pdir&amp;lt;/code&amp;gt; that specifies&lt;br /&gt;
a specific directory in which to store the downloaded PDB files. &lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;retrieve_pdb_file&amp;lt;/code&amp;gt; method also has some options for specifying&lt;br /&gt;
the compression format used for the download, and the program used&lt;br /&gt;
for local decompression (default &amp;lt;code&amp;gt;.Z&amp;lt;/code&amp;gt; format and &amp;lt;code&amp;gt;gunzip&amp;lt;/code&amp;gt;).&lt;br /&gt;
In addition, the PDB ftp site can be specified upon creation of the&lt;br /&gt;
&amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object. By default, the server of the Worldwide Protein Data Bank (ftp://ftp.wwpdb.org/pub/pdb/data/structures/divided/pdb/)&lt;br /&gt;
is used. See the API documentation for more details. Thanks again&lt;br /&gt;
to Kristian Rother for donating this module.&lt;br /&gt;
&lt;br /&gt;
==== How do I download the entire PDB? ====&lt;br /&gt;
&lt;br /&gt;
The following commands will store all PDB files in the &amp;lt;code&amp;gt;/data/pdb&amp;lt;/code&amp;gt;&lt;br /&gt;
directory:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
python PDBList.py all /data/pdb&lt;br /&gt;
python PDBList.py all /data/pdb -d&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The API method for this is called &amp;lt;code&amp;gt;download_entire_pdb&amp;lt;/code&amp;gt;.&lt;br /&gt;
Adding the &amp;lt;code&amp;gt;-d&amp;lt;/code&amp;gt; option will store all files in the same directory.&lt;br /&gt;
Otherwise, they are sorted into PDB-style subdirectories according&lt;br /&gt;
to their PDB ID's. Depending on the traffic, a complete download will&lt;br /&gt;
take 2-4 days.&lt;br /&gt;
&lt;br /&gt;
==== How do I keep a local copy of the PDB up-to-date? ====&lt;br /&gt;
&lt;br /&gt;
This can also be done using the &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object. One simply&lt;br /&gt;
creates a &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object (specifying the directory where&lt;br /&gt;
the local copy of the PDB is present) and calls the &amp;lt;code&amp;gt;update_pdb&amp;lt;/code&amp;gt;&lt;br /&gt;
method:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
pl = PDBList(pdb='/data/pdb')&lt;br /&gt;
pl.update_pdb()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
One can of course make a weekly cronjob out of this to keep&lt;br /&gt;
the local copy automatically up-to-date. The PDB ftp site can also&lt;br /&gt;
be specified (see API documentation).&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; has some additional methods that can be of use. The&lt;br /&gt;
&amp;lt;code&amp;gt;get_all_obsolete&amp;lt;/code&amp;gt; method can be used to get a list of all&lt;br /&gt;
obsolete PDB entries. The &amp;lt;code&amp;gt;changed_this_week&amp;lt;/code&amp;gt; method can&lt;br /&gt;
be used to obtain the entries that were added, modified or obsoleted&lt;br /&gt;
during the current week. For more info on the possibilities of &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt;,&lt;br /&gt;
see the API documentation.&lt;br /&gt;
&lt;br /&gt;
==== What about all those buggy PDB files? ====&lt;br /&gt;
&lt;br /&gt;
It is well known that many PDB files contain semantic errors (I'm&lt;br /&gt;
not talking about the structures themselves know, but their representation&lt;br /&gt;
in PDB files). Bio.PDB tries to handle this in two ways. The PDBParser&lt;br /&gt;
object can behave in two ways: a restrictive way and a permissive&lt;br /&gt;
way (THIS IS NOW THE DEFAULT). The restrictive way used to be the&lt;br /&gt;
default, but people seemed to think that Bio.PDB 'crashed' due to&lt;br /&gt;
a bug (hah!), so I changed it. If you ever encounter a real bug, please&lt;br /&gt;
tell me immediately!&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Permissive parser&lt;br /&gt;
parser = PDBParser(PERMISSIVE=1)&lt;br /&gt;
parser = PDBParser()  # The same (default)&lt;br /&gt;
# Strict parser&lt;br /&gt;
strict_parser = PDBParser(PERMISSIVE=0)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
In the permissive state (DEFAULT), PDB files that obviously contain&lt;br /&gt;
errors are 'corrected' (i.e. some residues or atoms are left out).&lt;br /&gt;
These errors include:&lt;br /&gt;
* Multiple residues with the same identifier&lt;br /&gt;
* Multiple atoms with the same identifier (taking into account the altloc identifier)&lt;br /&gt;
These errors indicate real problems in the PDB file (for details see&lt;br /&gt;
the Bioinformatics article). In the restrictive state, PDB files with&lt;br /&gt;
errors cause an exception to occur. This is useful to find errors&lt;br /&gt;
in PDB files.&lt;br /&gt;
&lt;br /&gt;
Some errors however are automatically corrected. Normally each disordered&lt;br /&gt;
atom should have a non-blank altloc identifier. However, there are&lt;br /&gt;
many structures that do not follow this convention, and have a blank&lt;br /&gt;
and a non-blank identifier for two disordered positions of the same&lt;br /&gt;
atom. This is automatically interpreted in the right way.&lt;br /&gt;
&lt;br /&gt;
Sometimes a structure contains a list of residues belonging to chain&lt;br /&gt;
A, followed by residues belonging to chain B, and again followed by&lt;br /&gt;
residues belonging to chain A, i.e. the chains are 'broken'. This&lt;br /&gt;
is also correctly interpreted.&lt;br /&gt;
&lt;br /&gt;
==== Can I write PDB files? ====&lt;br /&gt;
&lt;br /&gt;
Use the PDBIO class for this. It's easy to write out specific parts&lt;br /&gt;
of a structure too, of course.&lt;br /&gt;
&lt;br /&gt;
Example: saving a structure&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
io = PDBIO()&lt;br /&gt;
io.set_structure(s)&lt;br /&gt;
io.save('out.pdb')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
If you want to write out a part of the structure, make use of the&lt;br /&gt;
&amp;lt;code&amp;gt;Select&amp;lt;/code&amp;gt; class (also in &amp;lt;code&amp;gt;PDBIO&amp;lt;/code&amp;gt;). Select has four methods:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
accept_model(model)&lt;br /&gt;
accept_chain(chain)&lt;br /&gt;
accept_residue(residue)&lt;br /&gt;
accept_atom(atom)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
By default, every method returns 1 (which means the model/chain/residue/atom&lt;br /&gt;
is included in the output). By subclassing &amp;lt;code&amp;gt;Select&amp;lt;/code&amp;gt; and returning&lt;br /&gt;
0 when appropriate you can exclude models, chains, etc. from the output.&lt;br /&gt;
Cumbersome maybe, but very powerful. The following code only writes&lt;br /&gt;
out glycine residues:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
class GlySelect(Select):&lt;br /&gt;
    def accept_residue(self, residue):&lt;br /&gt;
        if residue.get_name()=='GLY':&lt;br /&gt;
            return 1&lt;br /&gt;
        else:&lt;br /&gt;
            return 0&lt;br /&gt;
&lt;br /&gt;
io = PDBIO()&lt;br /&gt;
io.set_structure(s)&lt;br /&gt;
io.save('gly_only.pdb', GlySelect())&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
If this is all too complicated for you, the &amp;lt;code&amp;gt;Dice&amp;lt;/code&amp;gt; module contains&lt;br /&gt;
a handy &amp;lt;code&amp;gt;extract&amp;lt;/code&amp;gt; function that writes out all residues in&lt;br /&gt;
a chain between a start and end residue.&lt;br /&gt;
&lt;br /&gt;
==== Can I write mmCIF files? ====&lt;br /&gt;
&lt;br /&gt;
No, and I also don't have plans to add that functionality soon (or&lt;br /&gt;
ever - I don't need it at all, and it's a lot of work, plus no-one&lt;br /&gt;
has ever asked for it). People who want to add this can contact me.&lt;br /&gt;
&lt;br /&gt;
=== The Structure object ===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==== What's the overall layout of a Structure object? ====&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object follows the so-called '''SMCRA'''&lt;br /&gt;
(Structure/Model/Chain/Residue/Atom) architecture : &lt;br /&gt;
* A structure consists of models&lt;br /&gt;
* A model consists of chains&lt;br /&gt;
* A chain consists of residues&lt;br /&gt;
* A residue consists of atoms&lt;br /&gt;
This is the way many structural biologists/bioinformaticians think&lt;br /&gt;
about structure, and provides a simple but efficient way to deal with&lt;br /&gt;
structure. Additional stuff is essentially added when needed. A UML&lt;br /&gt;
diagram of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object (forgetting about the &amp;lt;code&amp;gt;Disordered&amp;lt;/code&amp;gt;&lt;br /&gt;
classes for now) is shown in the figure below.&lt;br /&gt;
&lt;br /&gt;
[[Image:Smcra.png|600px|left|frame|Diagram of SMCRA architecture of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object. Full lines with diamonds denote aggregation, full lines with arrows denote referencing, full lines with triangles denote inheritance and dashed lines with triangles denote interface realization.]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br clear=all&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I navigate through a Structure object? ====&lt;br /&gt;
&lt;br /&gt;
The following code iterates through all atoms of a structure:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
p = PDBParser()&lt;br /&gt;
structure = p.get_structure('X', 'pdb1fat.ent')&lt;br /&gt;
for model in structure:&lt;br /&gt;
    for chain in model:&lt;br /&gt;
        for residue in chain:&lt;br /&gt;
            for atom in residue:&lt;br /&gt;
                print atom&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
There are also some shortcuts:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Iterate over all atoms in a structure&lt;br /&gt;
for atom in structure.get_atoms():&lt;br /&gt;
    print atom&lt;br /&gt;
&lt;br /&gt;
# Iterate over all residues in a model&lt;br /&gt;
for residue in model.get_residues():&lt;br /&gt;
    print residue&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Structures, models, chains, residues and atoms are called '''Entities'''&lt;br /&gt;
in Biopython. You can always get a parent &amp;lt;code&amp;gt;Entity&amp;lt;/code&amp;gt; from a child&lt;br /&gt;
&amp;lt;code&amp;gt;Entity&amp;lt;/code&amp;gt;, e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
residue = atom.get_parent()&lt;br /&gt;
chain = residue.get_parent()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
You can also test whether an &amp;lt;code&amp;gt;Entity&amp;lt;/code&amp;gt; has a certain child using&lt;br /&gt;
the &amp;lt;code&amp;gt;has_id&amp;lt;/code&amp;gt; method.&lt;br /&gt;
&lt;br /&gt;
==== Can I do that a bit more conveniently? ====&lt;br /&gt;
&lt;br /&gt;
You can do things like:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
atoms = structure.get_atoms()&lt;br /&gt;
residues = structure.get_residues()&lt;br /&gt;
atoms = chain.get_atoms()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
You can also use the &amp;lt;code&amp;gt;Selection.unfold_entities&amp;lt;/code&amp;gt; function:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Get all residues from a structure&lt;br /&gt;
res_list = Selection.unfold_entities(structure, 'R')&lt;br /&gt;
# Get all atoms from a chain&lt;br /&gt;
atom_list = Selection.unfold_entities(chain, 'A')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Obviously, &amp;lt;code&amp;gt;A&amp;lt;/code&amp;gt;=atom, &amp;lt;code&amp;gt;R&amp;lt;/code&amp;gt;=residue, &amp;lt;code&amp;gt;C&amp;lt;/code&amp;gt;=chain, &amp;lt;code&amp;gt;M&amp;lt;/code&amp;gt;=model, &amp;lt;code&amp;gt;S&amp;lt;/code&amp;gt;=structure.&lt;br /&gt;
You can use this to go up in the hierarchy, e.g. to get a list of (unique) &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;Chain&amp;lt;/code&amp;gt; parents from a list of&lt;br /&gt;
&amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;s:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
residue_list = Selection.unfold_entities(atom_list, 'R')&lt;br /&gt;
chain_list = Selection.unfold_entities(atom_list, 'C')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
For more info, see the API documentation.&lt;br /&gt;
&lt;br /&gt;
==== How do I extract a specific &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;Chain&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;Model&amp;lt;/code&amp;gt; from a &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt;? ====&lt;br /&gt;
&lt;br /&gt;
Easy. Here are some examples:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
model = structure[0]&lt;br /&gt;
chain = model['A']&lt;br /&gt;
residue = chain[100]&lt;br /&gt;
atom = residue['CA']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Note that you can use a shortcut:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
atom = structure[0]['A'][100]['CA']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== What is a model id? ====&lt;br /&gt;
&lt;br /&gt;
The model id is an integer which denotes the rank of the model in&lt;br /&gt;
the PDB/mmCIF file. The model is starts at 0. Crystal structures generally&lt;br /&gt;
have only one model (with id 0), while NMR files usually have several&lt;br /&gt;
models.&lt;br /&gt;
&lt;br /&gt;
==== What is a chain id? ====&lt;br /&gt;
&lt;br /&gt;
The chain id is specified in the PDB/mmCIF file, and is a single character&lt;br /&gt;
(typically a letter). &lt;br /&gt;
&lt;br /&gt;
==== What is a residue id? ====&lt;br /&gt;
&lt;br /&gt;
This is a bit more complicated, due to the clumsy PDB format. A residue&lt;br /&gt;
id is a tuple with three elements:&lt;br /&gt;
* The '''hetero-flag''': this is &amp;lt;code&amp;gt;'H_'&amp;lt;/code&amp;gt; plus the name of the hetero-residue (e.g. &amp;lt;code&amp;gt;'H_GLC'&amp;lt;/code&amp;gt; in the case of a glucose molecule), or &amp;lt;code&amp;gt;'W'&amp;lt;/code&amp;gt; in the case of a water molecule.&lt;br /&gt;
* The '''sequence identifier''' in the chain, e.g. 100&lt;br /&gt;
* The '''insertion code''', e.g. &amp;lt;code&amp;gt;'A'&amp;lt;/code&amp;gt;. The insertion code is sometimes used to preserve a certain desirable residue numbering scheme. A Ser 80 insertion mutant (inserted e.g. between a Thr 80 and an Asn 81 residue) could e.g. have sequence identifiers and insertion codes as follows: Thr 80 A, Ser 80 B, Asn 81. In this way the residue numbering scheme stays in tune with that of the wild type structure.&lt;br /&gt;
&lt;br /&gt;
The id of the above glucose residue would thus be &amp;lt;code&amp;gt;('H_GLC', 100, 'A')&amp;lt;/code&amp;gt;. If the hetero-flag and insertion code are blank, the sequence&lt;br /&gt;
identifier alone can be used:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Full id&lt;br /&gt;
residue = chain[(' ', 100, ' ')]&lt;br /&gt;
# Shortcut id&lt;br /&gt;
residue = chain[100]&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
The reason for the hetero-flag is that many, many PDB files use the same sequence identifier for an amino acid and a hetero-residue or a water, which would create obvious problems if the hetero-flag was not used.&lt;br /&gt;
&lt;br /&gt;
==== What is an atom id? ====&lt;br /&gt;
&lt;br /&gt;
The atom id is simply the atom name (eg. &amp;lt;code&amp;gt;'CA'&amp;lt;/code&amp;gt;). In practice,&lt;br /&gt;
the atom name is created by stripping all spaces from the atom name&lt;br /&gt;
in the PDB file. &lt;br /&gt;
&lt;br /&gt;
However, in PDB files, a space can be part of an atom name. Often,&lt;br /&gt;
calcium atoms are called &amp;lt;code&amp;gt;'CA..'&amp;lt;/code&amp;gt; in order to distinguish them&lt;br /&gt;
from C&amp;amp;alpha; atoms (which are called &amp;lt;code&amp;gt;'.CA.'&amp;lt;/code&amp;gt;). In cases&lt;br /&gt;
were stripping the spaces would create problems (ie. two atoms called&lt;br /&gt;
&amp;lt;code&amp;gt;'CA'&amp;lt;/code&amp;gt; in the same residue) the spaces are kept.&lt;br /&gt;
&lt;br /&gt;
==== How is disorder handled? ====&lt;br /&gt;
&lt;br /&gt;
This is one of the strong points of Bio.PDB. It can handle both disordered&lt;br /&gt;
atoms and point mutations (ie. a Gly and an Ala residue in the same&lt;br /&gt;
position). &lt;br /&gt;
&lt;br /&gt;
Disorder should be dealt with from two points of view: the atom and&lt;br /&gt;
the residue points of view. In general, I have tried to encapsulate&lt;br /&gt;
all the complexity that arises from disorder. If you just want to&lt;br /&gt;
loop over all C&amp;amp;alpha; atoms, you do not care that some residues&lt;br /&gt;
have a disordered side chain. On the other hand it should also be&lt;br /&gt;
possible to represent disorder completely in the data structure. Therefore,&lt;br /&gt;
disordered atoms or residues are stored in special objects that behave&lt;br /&gt;
as if there is no disorder. This is done by only representing a subset&lt;br /&gt;
of the disordered atoms or residues. Which subset is picked (e.g.&lt;br /&gt;
which of the two disordered OG side chain atom positions of a Ser&lt;br /&gt;
residue is used) can be specified by the user.&lt;br /&gt;
&lt;br /&gt;
'''Disordered atom positions''' are represented by ordinary &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;&lt;br /&gt;
objects, but all &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects that represent the same physical&lt;br /&gt;
atom are stored in a &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; object (see section [[#The Structure object]]).&lt;br /&gt;
Each &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object in a &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; object can&lt;br /&gt;
be uniquely indexed using its altloc specifier. The &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt;&lt;br /&gt;
object forwards all uncaught method calls to the selected Atom object,&lt;br /&gt;
by default the one that represents the atom with the highest&lt;br /&gt;
occupancy. The user can of course change the selected &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;&lt;br /&gt;
object, making use of its altloc specifier. In this way atom disorder&lt;br /&gt;
is represented correctly without much additional complexity. In other&lt;br /&gt;
words, if you are not interested in atom disorder, you will not be&lt;br /&gt;
bothered by it.&lt;br /&gt;
&lt;br /&gt;
Each disordered atom has a characteristic altloc identifier. You can&lt;br /&gt;
specify that a &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; object should behave like&lt;br /&gt;
the &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object associated with a specific altloc identifier:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
atom.disordered_select('A')  # select altloc A atom&lt;br /&gt;
atom.disordered_select('B')  # select altloc B atom &lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
A special case arises when disorder is due to '''point mutations''',&lt;br /&gt;
i.e. when two or more point mutants of a polypeptide are present in&lt;br /&gt;
the crystal. An example of this can be found in PDB structure 1EN2.&lt;br /&gt;
&lt;br /&gt;
Since these residues belong to a different residue type (e.g. let's&lt;br /&gt;
say Ser 60 and Cys 60) they should not be stored in a single &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt;&lt;br /&gt;
object as in the common case. In this case, each residue is represented&lt;br /&gt;
by one &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object, and both &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; objects&lt;br /&gt;
are stored in a single &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt; object (see section [[#The Structure object]]).&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt; object forwards all uncaught&lt;br /&gt;
methods to the selected &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object (by default the last&lt;br /&gt;
&amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object added), and thus behaves like an ordinary&lt;br /&gt;
residue. Each &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object in a &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt;&lt;br /&gt;
object can be uniquely identified by its residue name. In the above&lt;br /&gt;
example, residue Ser 60 would have id 'SER' in the &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt;&lt;br /&gt;
object, while residue Cys 60 would have id 'CYS'. The user can select&lt;br /&gt;
the active &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object in a &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt;&lt;br /&gt;
object via this id.&lt;br /&gt;
&lt;br /&gt;
Example: suppose that a chain has a point mutation at position 10,&lt;br /&gt;
consisting of a Ser and a Cys residue. Make sure that residue 10 of&lt;br /&gt;
this chain behaves as the Cys residue.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
residue = chain[10]&lt;br /&gt;
residue.disordered_select('CYS')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
In addition, you can get a list of all &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects (ie.&lt;br /&gt;
all &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; objects are 'unpacked' to their individual&lt;br /&gt;
&amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects) using the &amp;lt;code&amp;gt;get_unpacked_list&amp;lt;/code&amp;gt; method&lt;br /&gt;
of a (&amp;lt;code&amp;gt;Disordered&amp;lt;/code&amp;gt;)&amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object.&lt;br /&gt;
&lt;br /&gt;
==== Can I sort residues in a chain somehow? ====&lt;br /&gt;
&lt;br /&gt;
Yes, kinda, but I'm waiting for a request for this feature to finish&lt;br /&gt;
it :-).&lt;br /&gt;
&lt;br /&gt;
==== How are ligands and solvent handled? ====&lt;br /&gt;
&lt;br /&gt;
See 'What is a residue id?'.&lt;br /&gt;
&lt;br /&gt;
==== What about B factors? ====&lt;br /&gt;
&lt;br /&gt;
Well, yes! Bio.PDB supports isotropic and anisotropic B factors, and&lt;br /&gt;
also deals with standard deviations of anisotropic B factor if present&lt;br /&gt;
(see the section [[#Analysis]]).&lt;br /&gt;
&lt;br /&gt;
==== What about standard deviation of atomic positions? ====&lt;br /&gt;
&lt;br /&gt;
Yup, supported. See the section [[#Analysis]].&lt;br /&gt;
&lt;br /&gt;
==== I think the SMCRA data structure is not flexible/sexy/whatever enough... ====&lt;br /&gt;
&lt;br /&gt;
Sure, sure. Everybody is always coming up with (mostly vaporware or&lt;br /&gt;
partly implemented) data structures that handle all possible situations&lt;br /&gt;
and are extensible in all thinkable (and unthinkable) ways. The prosaic&lt;br /&gt;
truth however is that 99.9\% of people using (and I mean really using!)&lt;br /&gt;
crystal structures think in terms of models, chains, residues and&lt;br /&gt;
atoms. The philosophy of Bio.PDB is to provide a reasonably fast,&lt;br /&gt;
clean, simple, but complete data structure to access structure data.&lt;br /&gt;
The proof of the pudding is in the eating.&lt;br /&gt;
&lt;br /&gt;
Moreover, it is quite easy to build more specialised data structures&lt;br /&gt;
on top of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; class (eg. there's a &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt;&lt;br /&gt;
class). On the other hand, the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object is built&lt;br /&gt;
using a Parser/Consumer approach (called &amp;lt;code&amp;gt;PDBParser&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;MMCIFParser&amp;lt;/code&amp;gt;&lt;br /&gt;
and &amp;lt;code&amp;gt;StructureBuilder&amp;lt;/code&amp;gt;, respectively). One can easily reuse&lt;br /&gt;
the PDB/mmCIF parsers by implementing a specialised &amp;lt;code&amp;gt;StructureBuilder&amp;lt;/code&amp;gt;&lt;br /&gt;
class. It is of course also trivial to add support for new file formats&lt;br /&gt;
by writing new parsers.&lt;br /&gt;
&lt;br /&gt;
=== Analysis ===&lt;br /&gt;
&lt;br /&gt;
==== How do I extract information from an &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object? ====&lt;br /&gt;
&lt;br /&gt;
Using the following methods:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
a.get_name()           # atom name (spaces stripped, e.g. 'CA')&lt;br /&gt;
a.get_id()             # id (equals atom name)&lt;br /&gt;
a.get_coord()          # atomic coordinates&lt;br /&gt;
a.get_vector()         # atomic coordinates as Vector object&lt;br /&gt;
a.get_bfactor()        # isotropic B factor&lt;br /&gt;
a.get_occupancy()      # occupancy&lt;br /&gt;
a.get_altloc()         # alternative location specifier&lt;br /&gt;
a.get_sigatm()         # std. dev. of atomic parameters&lt;br /&gt;
a.get_siguij()         # std. dev. of anisotropic B factor&lt;br /&gt;
a.get_anisou()         # anisotropic B factor&lt;br /&gt;
a.get_fullname()       # atom name (with spaces, e.g. '.CA.')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I extract information from a &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object? ====&lt;br /&gt;
&lt;br /&gt;
Using the following methods:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
r.get_resname()         # return the residue name (eg. 'GLY')&lt;br /&gt;
r.is_disordered()       # 1 if the residue has disordered atoms&lt;br /&gt;
r.get_segid()           # return the SEGID&lt;br /&gt;
r.has_id(name)          # test if a residue has a certain atom&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I measure distances? ====&lt;br /&gt;
&lt;br /&gt;
That's simple: the minus operator for atoms has been overloaded to&lt;br /&gt;
return the distance between two atoms. &lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Get some atoms&lt;br /&gt;
ca1 = residue1['CA']&lt;br /&gt;
ca2 = residue2['CA']&lt;br /&gt;
# Simply subtract the atoms to get their distance&lt;br /&gt;
distance = ca1-ca2&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I measure angles? ====&lt;br /&gt;
&lt;br /&gt;
This can easily be done via the vector representation of the atomic coordinates, and the &amp;lt;code&amp;gt;calc_angle&amp;lt;/code&amp;gt; function from the &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; module:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
vector1 = atom1.get_vector()&lt;br /&gt;
vector2 = atom2.get_vector()&lt;br /&gt;
vector3 = atom3.get_vector()&lt;br /&gt;
angle = calc_angle(vector1, vector2, vector3)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I measure torsion angles? ====&lt;br /&gt;
&lt;br /&gt;
Again, this can easily be done via the vector representation of the&lt;br /&gt;
atomic coordinates, this time using the &amp;lt;code&amp;gt;calc_dihedral&amp;lt;/code&amp;gt; function&lt;br /&gt;
from the &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; module:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
vector1 = atom1.get_vector()&lt;br /&gt;
vector2 = atom2.get_vector()&lt;br /&gt;
vector3 = atom3.get_vector()&lt;br /&gt;
vector4 = atom4.get_vector()&lt;br /&gt;
angle = calc_dihedral(vector1, vector2, vector3, vector4)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I determine atom-atom contacts? ====&lt;br /&gt;
&lt;br /&gt;
Use &amp;lt;code&amp;gt;NeighborSearch&amp;lt;/code&amp;gt;. This uses a KD tree data structure coded&lt;br /&gt;
in C behind the screens, so it's pretty darn fast (see &amp;lt;code&amp;gt;Bio.KDTree&amp;lt;/code&amp;gt;).&lt;br /&gt;
&lt;br /&gt;
==== How do I extract polypeptides from a &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object? ====&lt;br /&gt;
&lt;br /&gt;
Use &amp;lt;code&amp;gt;PolypeptideBuilder&amp;lt;/code&amp;gt;. You can use the resulting &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt;&lt;br /&gt;
object to get the sequence as a &amp;lt;code&amp;gt;Seq&amp;lt;/code&amp;gt; object or to get a list&lt;br /&gt;
of C&amp;amp;alpha; atoms as well. Polypeptides can be built using a C-N&lt;br /&gt;
or a C&amp;amp;alpha;-C&amp;amp;alpha; distance criterion.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Using C-N &lt;br /&gt;
ppb=PPBuilder()&lt;br /&gt;
for pp in ppb.build_peptides(structure): &lt;br /&gt;
    print pp.get_sequence()&lt;br /&gt;
&lt;br /&gt;
# Using CA-CA&lt;br /&gt;
ppb=CaPPBuilder()&lt;br /&gt;
for pp in ppb.build_peptides(structure): &lt;br /&gt;
    print pp.get_sequence()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that in the above case only model 0 of the structure is considered&lt;br /&gt;
by &amp;lt;code&amp;gt;PolypeptideBuilder&amp;lt;/code&amp;gt;. However, it is possible to use &amp;lt;code&amp;gt;PolypeptideBuilder&amp;lt;/code&amp;gt;&lt;br /&gt;
to build &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt; objects from &amp;lt;code&amp;gt;Model&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;Chain&amp;lt;/code&amp;gt;&lt;br /&gt;
objects as well.&lt;br /&gt;
&lt;br /&gt;
==== How do I get the sequence of a structure? ====&lt;br /&gt;
&lt;br /&gt;
The first thing to do is to extract all polypeptides from the structure&lt;br /&gt;
(see previous entry). The sequence of each polypeptide can then easily&lt;br /&gt;
be obtained from the &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt; objects. The sequence is&lt;br /&gt;
represented as a Biopython &amp;lt;code&amp;gt;Seq&amp;lt;/code&amp;gt; object, and its alphabet is&lt;br /&gt;
defined by a &amp;lt;code&amp;gt;ProteinAlphabet&amp;lt;/code&amp;gt; object.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; seq = polypeptide.get_sequence()&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print seq&lt;br /&gt;
Seq('SNVVE...', &amp;lt;class Bio.Alphabet.ProteinAlphabet&amp;gt;)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I determine secondary structure? ====&lt;br /&gt;
&lt;br /&gt;
For this functionality, you need to install DSSP (and obtain a license&lt;br /&gt;
for it - free for academic use, see http://www.cmbi.kun.nl/gv/dssp/).&lt;br /&gt;
Then use the &amp;lt;code&amp;gt;DSSP&amp;lt;/code&amp;gt; class, which maps &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; objects&lt;br /&gt;
to their secondary structure (and accessible surface area). The DSSP&lt;br /&gt;
codes are listed in Table \ref{cap:DSSP-codes. Note that DSSP (the&lt;br /&gt;
program, and thus by consequence the class) cannot handle multiple&lt;br /&gt;
models!&lt;br /&gt;
&lt;br /&gt;
%&lt;br /&gt;
\begin{table&lt;br /&gt;
&lt;br /&gt;
==== \begin{tabular{|c|c|&lt;br /&gt;
\hline &lt;br /&gt;
Code&amp;amp;&lt;br /&gt;
Secondary structure\tabularnewline&lt;br /&gt;
\hline&lt;br /&gt;
\hline &lt;br /&gt;
H&amp;amp;&lt;br /&gt;
&amp;amp;alpha;-helix\tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
B&amp;amp;&lt;br /&gt;
Isolated &amp;amp;beta;-bridge residue\tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
E&amp;amp;&lt;br /&gt;
Strand \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
G&amp;amp;&lt;br /&gt;
3-10 helix \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
I&amp;amp;&lt;br /&gt;
$\Pi$-helix \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
T&amp;amp;&lt;br /&gt;
Turn\tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
S&amp;amp;&lt;br /&gt;
Bend \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
-&amp;amp;&lt;br /&gt;
Other\tabularnewline&lt;br /&gt;
\hline&lt;br /&gt;
\end{tabular&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
\caption{\label{cap:DSSP-codesDSSP codes in Bio.PDB.&lt;br /&gt;
\end{table&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==== How do I calculate the accessible surface area of a residue? ====&lt;br /&gt;
&lt;br /&gt;
Use the &amp;lt;code&amp;gt;DSSP&amp;lt;/code&amp;gt; class (see also previous entry). But see also&lt;br /&gt;
next entry.&lt;br /&gt;
&lt;br /&gt;
==== How do I calculate residue depth? ====&lt;br /&gt;
&lt;br /&gt;
Residue depth is the average distance of a residue's atoms from the&lt;br /&gt;
solvent accessible surface. It's a fairly new and very powerful parameterization&lt;br /&gt;
of solvent accessibility. For this functionality, you need to install&lt;br /&gt;
Michel Sanner's MSMS program (http://www.scripps.edu/pub/olson-web/people/sanner/html/msms_home.html).&lt;br /&gt;
Then use the &amp;lt;code&amp;gt;ResidueDepth&amp;lt;/code&amp;gt; class. This class behaves as a&lt;br /&gt;
dictionary which maps &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; objects to corresponding (residue&lt;br /&gt;
depth, C&amp;amp;alpha; depth) tuples. The C&amp;amp;alpha; depth is the distance&lt;br /&gt;
of a residue's C&amp;amp;alpha; atom to the solvent accessible surface. &lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
model = structure[0]&lt;br /&gt;
rd = ResidueDepth(model, pdb_file)&lt;br /&gt;
residue_depth, ca_depth = rd[some_residue]&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
You can also get access to the molecular surface itself (via the &amp;lt;code&amp;gt;get_surface&amp;lt;/code&amp;gt;&lt;br /&gt;
function), in the form of a Numeric python array with the surface points.&lt;br /&gt;
&lt;br /&gt;
==== How do I calculate Half Sphere Exposure? ====&lt;br /&gt;
&lt;br /&gt;
Half Sphere Exposure (HSE) is a new, 2D measure of solvent exposure.&lt;br /&gt;
Basically, it counts the number of C&amp;amp;alpha; atoms around a residue&lt;br /&gt;
in the direction of its side chain, and in the opposite direction&lt;br /&gt;
(within a radius of 13 Å). Despite its simplicity, it outperforms&lt;br /&gt;
many other measures of solvent exposure. An article describing this&lt;br /&gt;
novel 2D measure has been submitted.&lt;br /&gt;
&lt;br /&gt;
HSE comes in two flavors: HSE&amp;amp;alpha; and HSE&amp;amp;beta;. The former&lt;br /&gt;
only uses the C&amp;amp;alpha; atom positions, while the latter uses the&lt;br /&gt;
C&amp;amp;alpha; and C&amp;amp;beta; atom positions. The HSE measure is calculated&lt;br /&gt;
by the &amp;lt;code&amp;gt;HSExposure&amp;lt;/code&amp;gt; class, which can also calculate the contact&lt;br /&gt;
number. The latter class has methods which return dictionaries that&lt;br /&gt;
map a &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object to its corresponding HSE&amp;amp;alpha;, HSE&amp;amp;beta;&lt;br /&gt;
and contact number values.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
model = structure[0]&lt;br /&gt;
hse = HSExposure()&lt;br /&gt;
# Calculate HSEalpha&lt;br /&gt;
exp_ca = hse.calc_hs_exposure(model, option='CA3')&lt;br /&gt;
# Calculate HSEbeta&lt;br /&gt;
exp_cb = hse.calc_hs_exposure(model, option='CB')&lt;br /&gt;
# Calculate classical coordination number&lt;br /&gt;
exp_fs = hse.calc_fs_exposure(model)&lt;br /&gt;
# Print HSEalpha for a residue&lt;br /&gt;
print exp_ca[some_residue]&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I map the residues of two related structures onto each other? ====&lt;br /&gt;
&lt;br /&gt;
First, create an alignment file in FASTA format, then use the &amp;lt;code&amp;gt;StructureAlignment&amp;lt;/code&amp;gt;&lt;br /&gt;
class. This class can also be used for alignments with more than two&lt;br /&gt;
structures.&lt;br /&gt;
&lt;br /&gt;
==== How do I test if a Residue object is an amino acid? ====&lt;br /&gt;
&lt;br /&gt;
Use &amp;lt;code&amp;gt;is_aa(residue)&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
==== Can I do vector operations on atomic coordinates? ====&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects return a &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; object representation&lt;br /&gt;
of the coordinates with the &amp;lt;code&amp;gt;get_vector&amp;lt;/code&amp;gt; method. &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt;&lt;br /&gt;
implements the full set of 3D vector operations, matrix multiplication&lt;br /&gt;
(left and right) and some advanced rotation-related operations as&lt;br /&gt;
well. See also next question.&lt;br /&gt;
&lt;br /&gt;
==== How do I put a virtual C&amp;amp;beta; on a Gly residue? ====&lt;br /&gt;
&lt;br /&gt;
OK, I admit, this example is only present to show off the possibilities&lt;br /&gt;
of Bio.PDB's &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; module (though this code is actually&lt;br /&gt;
used in the &amp;lt;code&amp;gt;HSExposure&amp;lt;/code&amp;gt; module, which contains a novel way&lt;br /&gt;
to parametrize residue exposure - publication underway). Suppose that&lt;br /&gt;
you would like to find the position of a Gly residue's C&amp;amp;beta; atom,&lt;br /&gt;
if it had one. How would you do that? Well, rotating the N atom of&lt;br /&gt;
the Gly residue along the C&amp;amp;alpha;-C bond over -120 degrees roughly&lt;br /&gt;
puts it in the position of a virtual C&amp;amp;beta; atom. Here's how to&lt;br /&gt;
do it, making use of the &amp;lt;code&amp;gt;rotaxis&amp;lt;/code&amp;gt; method (which can be used&lt;br /&gt;
to construct a rotation around a certain axis) of the &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt;&lt;br /&gt;
module:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# get atom coordinates as vectors&lt;br /&gt;
n = residue['N'].get_vector() &lt;br /&gt;
c = residue['C'].get_vector() &lt;br /&gt;
ca = residue['CA'].get_vector()&lt;br /&gt;
# center at origin&lt;br /&gt;
n = n - ca&lt;br /&gt;
c = c - ca&lt;br /&gt;
# find rotation matrix that rotates n -120 degrees along the ca-c vector&lt;br /&gt;
rot = rotaxis(-pi{*120.0/180.0, c)&lt;br /&gt;
# apply rotation to ca-n vector&lt;br /&gt;
cb_at_origin = n.left_multiply(rot)&lt;br /&gt;
# put on top of ca atom&lt;br /&gt;
cb = cb_at_origin + ca&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
This example shows that it's possible to do some quite nontrivial&lt;br /&gt;
vector operations on atomic data, which can be quite useful. In addition&lt;br /&gt;
to all the usual vector operations (cross (use &amp;lt;code&amp;gt;**&amp;lt;/code&amp;gt;), and&lt;br /&gt;
dot (use &amp;lt;code&amp;gt;*&amp;lt;/code&amp;gt;) product, angle, norm, etc.) and the above mentioned&lt;br /&gt;
&amp;lt;code&amp;gt;rotaxis function, the &amp;lt;code&amp;gt;Vector module also has methods&lt;br /&gt;
to rotate (&amp;lt;code&amp;gt;rotmat&amp;lt;/code&amp;gt;) or reflect (&amp;lt;code&amp;gt;refmat&amp;lt;/code&amp;gt;) one vector&lt;br /&gt;
on top of another.&lt;br /&gt;
&lt;br /&gt;
=== Manipulating the structure ===&lt;br /&gt;
&lt;br /&gt;
==== How do I superimpose two structures? ====&lt;br /&gt;
&lt;br /&gt;
Surprisingly, this is done using the &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; object.&lt;br /&gt;
This object calculates the rotation and translation matrix that rotates&lt;br /&gt;
two lists of atoms on top of each other in such a way that their RMSD&lt;br /&gt;
is minimized. Of course, the two lists need to contain the same amount&lt;br /&gt;
of atoms. The &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; object can also apply the rotation/translation&lt;br /&gt;
to a list of atoms. The rotation and translation are stored as a tuple&lt;br /&gt;
in the &amp;lt;code&amp;gt;rotran&amp;lt;/code&amp;gt; attribute of the &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; object&lt;br /&gt;
(note that the rotation is right multiplying!). The RMSD is stored&lt;br /&gt;
in the &amp;lt;code&amp;gt;rmsd&amp;lt;/code&amp;gt; attribute.&lt;br /&gt;
&lt;br /&gt;
The algorithm used by &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; comes from \textit{Matrix&lt;br /&gt;
computations, 2nd ed. Golub, G. \&amp;amp; Van Loan (1989) and makes use&lt;br /&gt;
of singular value decomposition (this is implemented in the general&lt;br /&gt;
&amp;lt;code&amp;gt;Bio.SVDSuperimposer&amp;lt;/code&amp;gt; module).&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
sup = Superimposer()&lt;br /&gt;
# Specify the atom lists&lt;br /&gt;
# 'fixed' and 'moving' are lists of Atom objects&lt;br /&gt;
# The moving atoms will be put on the fixed atoms&lt;br /&gt;
sup.set_atoms(fixed, moving)&lt;br /&gt;
# Print rotation/translation/rmsd&lt;br /&gt;
print sup.rotran&lt;br /&gt;
print sup.rms &lt;br /&gt;
# Apply rotation/translation to the moving atoms&lt;br /&gt;
sup.apply(moving)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I superimpose two structures based on their active sites? ====&lt;br /&gt;
&lt;br /&gt;
Pretty easily. Use the active site atoms to calculate the rotation/translation&lt;br /&gt;
matrices (see above), and apply these to the whole molecule.&lt;br /&gt;
&lt;br /&gt;
==== Can I manipulate the atomic coordinates? ====&lt;br /&gt;
&lt;br /&gt;
Yes, using the &amp;lt;code&amp;gt;transform&amp;lt;/code&amp;gt; method of the &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object,&lt;br /&gt;
or directly using the &amp;lt;code&amp;gt;set_coord&amp;lt;/code&amp;gt; method.&lt;br /&gt;
&lt;br /&gt;
== Other Structural Bioinformatics modules ==&lt;br /&gt;
&lt;br /&gt;
==== Bio.SCOP ====&lt;br /&gt;
&lt;br /&gt;
See the main Biopython tutorial.&lt;br /&gt;
&lt;br /&gt;
==== Bio.FSSP ====&lt;br /&gt;
&lt;br /&gt;
No documentation available yet.&lt;br /&gt;
&lt;br /&gt;
== You haven't answered my question yet! ==&lt;br /&gt;
&lt;br /&gt;
Woah! It's late and I'm tired, and a glass of excellent ''Pedro Ximenez'' sherry is waiting for me. Just drop me a mail, and I'll answer&lt;br /&gt;
you in the morning (with a bit of luck...).&lt;br /&gt;
&lt;br /&gt;
== Contributors ==&lt;br /&gt;
&lt;br /&gt;
The main author/maintainer of Bio.PDB is:&lt;br /&gt;
&lt;br /&gt;
 Thomas Hamelryck&lt;br /&gt;
 Bioinformatics center&lt;br /&gt;
 Institute of Molecular Biology&lt;br /&gt;
 University of Copenhagen&lt;br /&gt;
 Universitetsparken 15, Bygning 10&lt;br /&gt;
 DK-2100 København Ø&lt;br /&gt;
 Denmark&lt;br /&gt;
 thamelry@binf.ku.dk&lt;br /&gt;
&lt;br /&gt;
Kristian Rother donated code to interact with the PDB database, and to parse the PDB&lt;br /&gt;
header. Indraneel Majumdar sent in some bug reports and assisted in&lt;br /&gt;
coding the &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt; module. Many thanks to Brad Chapman,&lt;br /&gt;
Jeffrey Chang, Andrew Dalke and Iddo Friedberg for suggestions, comments,&lt;br /&gt;
help and/or biting criticism :-).&lt;br /&gt;
&lt;br /&gt;
== Can I contribute? ==&lt;br /&gt;
&lt;br /&gt;
Yes, yes, yes! Just send an e-mail to [mailto:thamelry@binf.ku.dk Thomas Hamelryck] or to [mailto:biopython-dev@biopython.org the Biopython developers] if you have something useful to contribute! Eternal fame awaits!&lt;/div&gt;</summary>
		<author><name>Mdehoon</name></author>	</entry>

	<entry>
		<id>http://www.biopython.org/wiki/The_Biopython_Structural_Bioinformatics_FAQ</id>
		<title>The Biopython Structural Bioinformatics FAQ</title>
		<link rel="alternate" type="text/html" href="http://www.biopython.org/wiki/The_Biopython_Structural_Bioinformatics_FAQ"/>
				<updated>2013-03-26T08:41:48Z</updated>
		
		<summary type="html">&lt;p&gt;Mdehoon: /* Can I write mmCIF files? */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Introduction ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB is a biopython module that focuses on working with crystal&lt;br /&gt;
structures of biological macromolecules. This document gives a fairly&lt;br /&gt;
complete overview of Bio.PDB.&lt;br /&gt;
&lt;br /&gt;
== Bio.PDB's installation ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB is automatically installed as part of Biopython. Biopython&lt;br /&gt;
can be obtained from http://www.biopython.org. It runs on many&lt;br /&gt;
platforms (Linux/Unix, windows, Mac,...).&lt;br /&gt;
&lt;br /&gt;
== Who's using Bio.PDB? ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB was used in the construction of DISEMBL, a web server that&lt;br /&gt;
predicts disordered regions in proteins (http://dis.embl.de/),&lt;br /&gt;
and COLUMBA, a website that provides annotated protein structures&lt;br /&gt;
(http://www.columba-db.de/). Bio.PDB has also been used to&lt;br /&gt;
perform a large scale search for active sites similarities between&lt;br /&gt;
protein structures in the PDB (see [http://dx.doi.org/10.1002/prot.10338 ''Proteins'' '''51''': 96&amp;amp;ndash;108 (2003)]),&lt;br /&gt;
and to develop a new algorithm that identifies linear secondary structure elements (see [http://www.biomedcentral.com/1471-2105/6/202 ''BMC Bioinformatics'' '''6''': 202 (2005)]).&lt;br /&gt;
&lt;br /&gt;
Judging from requests for features and information, Bio.PDB is also&lt;br /&gt;
used by several LPCs (Large Pharmaceutical Companies :-).&lt;br /&gt;
&lt;br /&gt;
== Is there a Bio.PDB reference? ==&lt;br /&gt;
&lt;br /&gt;
Yes, and I'd appreciate it if you would refer to Bio.PDB in publications&lt;br /&gt;
if you make use of it. The reference is:&lt;br /&gt;
&lt;br /&gt;
\begin{quote&lt;br /&gt;
Hamelryck, T., Manderick, B. (2003) PDB parser and structure class&lt;br /&gt;
implemented in Python. \textit{Bioinformatics, \textbf{19, 2308-2310. &lt;br /&gt;
\end{quote&lt;br /&gt;
The article can be freely downloaded via the Bioinformatics journal&lt;br /&gt;
website (http://www.binf.ku.dk/users/thamelry/references.html).&lt;br /&gt;
I welcome e-mails telling me what you are using Bio.PDB for. Feature&lt;br /&gt;
requests are welcome too.&lt;br /&gt;
&lt;br /&gt;
== How well tested is Bio.PDB? ==&lt;br /&gt;
&lt;br /&gt;
Pretty well, actually. Bio.PDB has been extensively tested on nearly&lt;br /&gt;
5500 structures from the PDB - all structures seemed to be parsed&lt;br /&gt;
correctly. More details can be found in the Bio.PDB Bioinformatics&lt;br /&gt;
article. Bio.PDB has been used/is being used in many research projects&lt;br /&gt;
as a reliable tool. In fact, I'm using Bio.PDB almost daily for research&lt;br /&gt;
purposes and continue working on improving it and adding new features.&lt;br /&gt;
&lt;br /&gt;
== How fast is it? ==&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;PDBParser&amp;lt;/code&amp;gt; performance was tested on about 800 structures&lt;br /&gt;
(each belonging to a unique SCOP superfamily). This takes about 20&lt;br /&gt;
minutes, or on average 1.5 seconds per structure. Parsing the structure&lt;br /&gt;
of the large ribosomal subunit (1FKK), which contains about 64000&lt;br /&gt;
atoms, takes 10 seconds on a 1000 MHz PC. In short: it's more than&lt;br /&gt;
fast enough for many applications.&lt;br /&gt;
&lt;br /&gt;
== Why should I use Bio.PDB? ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB might be exactly what you want, and then again it might not.&lt;br /&gt;
If you are interested in data mining the PDB header, you might want&lt;br /&gt;
to look elsewhere because there is only limited support for this.&lt;br /&gt;
If you look for a powerful, complete data structure to access the&lt;br /&gt;
atomic data Bio.PDB is probably for you. &lt;br /&gt;
&lt;br /&gt;
== Usage ==&lt;br /&gt;
&lt;br /&gt;
=== General questions ===&lt;br /&gt;
&lt;br /&gt;
==== Importing Bio.PDB ====&lt;br /&gt;
&lt;br /&gt;
That's simple:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
from Bio.PDB import *&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Is there support for molecular graphics? ====&lt;br /&gt;
&lt;br /&gt;
Not directly, mostly since there are quite a few Python based/Python&lt;br /&gt;
aware solutions already, that can potentially be used with Bio.PDB.&lt;br /&gt;
My choice is Pymol, BTW (I've used this successfully with Bio.PDB,&lt;br /&gt;
and there will probably be specific PyMol modules in Bio.PDB soon/some&lt;br /&gt;
day). Python based/aware molecular graphics solutions include:&lt;br /&gt;
&lt;br /&gt;
* PyMol: http://pymol.sourceforge.net/&lt;br /&gt;
* Chimera: http://www.cgl.ucsf.edu/chimera/&lt;br /&gt;
* PMV: http://www.scripps.edu/~sanner/python/&lt;br /&gt;
* Coot: http://www.ysbl.york.ac.uk/~emsley/coot/&lt;br /&gt;
* CCP4mg: http://www.ysbl.york.ac.uk/~lizp/molgraphics.html&lt;br /&gt;
* mmLib: http://pymmlib.sourceforge.net/ &lt;br /&gt;
* VMD: http://www.ks.uiuc.edu/Research/vmd/&lt;br /&gt;
* MMTK: http://starship.python.net/crew/hinsen/MMTK/&lt;br /&gt;
&lt;br /&gt;
I'd be crazy to write another molecular graphics application (been&lt;br /&gt;
there - done that, actually :-).&lt;br /&gt;
&lt;br /&gt;
=== Input/output ===&lt;br /&gt;
&lt;br /&gt;
==== How do I create a structure object from a PDB file? ====&lt;br /&gt;
&lt;br /&gt;
First, create a &amp;lt;code&amp;gt;PDBParser&amp;lt;/code&amp;gt; object:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
parser = PDBParser()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then, create a structure object from a PDB file in the following way&lt;br /&gt;
(the PDB file in this case is called '1FAT.pdb', 'PHA-L' is a user&lt;br /&gt;
defined name for the structure):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
structure = parser.get_structure('PHA-L', '1FAT.pdb')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I create a structure object from an mmCIF file? ====&lt;br /&gt;
&lt;br /&gt;
Similarly to the case the case of PDB files, first create an &amp;lt;code&amp;gt;MMCIFParser&amp;lt;/code&amp;gt;&lt;br /&gt;
object:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
parser = MMCIFParser()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Then use this parser to create a structure object from the mmCIF file:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
structure = parser.get_structure('PHA-L', '1FAT.cif')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== ...and what about the new PDB XML format? ====&lt;br /&gt;
&lt;br /&gt;
That's not yet supported, but I'm definitely planning to support that&lt;br /&gt;
in the future (it's not a lot of work). Contact me if you need this,&lt;br /&gt;
it might encourage me :-).&lt;br /&gt;
&lt;br /&gt;
==== I'd like to have some more low level access to an mmCIF file... ====&lt;br /&gt;
&lt;br /&gt;
You got it. You can create a python dictionary that maps all mmCIF&lt;br /&gt;
tags in an mmCIF file to their values. If there are multiple values&lt;br /&gt;
(like in the case of tag &amp;lt;code&amp;gt;_atom_site.Cartn_y&amp;lt;/code&amp;gt;, which holds&lt;br /&gt;
the ''y'' coordinates of all atoms), the tag is mapped to a list of values.&lt;br /&gt;
The dictionary is created from the mmCIF file as follows:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
mmcif_dict = MMCIF2Dict('1FAT.cif')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example: get the solvent content from an mmCIF file:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
sc = mmcif_dict['_exptl_crystal.density_percent_sol']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example: get the list of the ''y'' coordinates of all atoms&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
y_list = mmcif_dict['_atom_site.Cartn_y']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Can I access the header information? ====&lt;br /&gt;
&lt;br /&gt;
Thanks to Christian Rother you can access some information from the&lt;br /&gt;
PDB header. Note however that many PDB files contain headers with&lt;br /&gt;
incomplete or erroneous information. Many of the errors have been&lt;br /&gt;
fixed in the equivalent mmCIF files.&lt;br /&gt;
'''Hence, if you are interested in the header information, it is a good idea to extract information from mmCIF files using the &amp;lt;code&amp;gt;MMCIF2Dict&amp;lt;/code&amp;gt; tool described above, instead of parsing the PDB header.''' &lt;br /&gt;
&lt;br /&gt;
Now that is clarified, let's return to parsing the PDB header. The&lt;br /&gt;
structure object has an attribute called &amp;lt;code&amp;gt;header&amp;lt;/code&amp;gt; which is&lt;br /&gt;
a python dictionary that maps header records to their values.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
resolution = structure.header['resolution']&lt;br /&gt;
keywords = structure.header['keywords']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The available keys are &amp;lt;code&amp;gt;name&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;head&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;deposition_date&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;release_date&amp;lt;/code&amp;gt;,&lt;br /&gt;
&amp;lt;code&amp;gt;structure_method&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;resolution&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;structure_reference&amp;lt;/code&amp;gt; (maps to&lt;br /&gt;
a list of references), &amp;lt;code&amp;gt;journal_reference&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;author&amp;lt;/code&amp;gt; and&lt;br /&gt;
&amp;lt;code&amp;gt;compound&amp;lt;/code&amp;gt; (maps to a dictionary with various information about&lt;br /&gt;
the crystallized compound).&lt;br /&gt;
&lt;br /&gt;
The dictionary can also be created without creating a &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt;&lt;br /&gt;
object, ie. directly from the PDB file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
handle = open(filename,'r')&lt;br /&gt;
header_dict = parse_pdb_header(handle)&lt;br /&gt;
handle.close()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Can I use Bio.PDB with NMR structures (ie. with more than one model)? ====&lt;br /&gt;
&lt;br /&gt;
Sure. Many PDB parsers assume that there is only one model, making&lt;br /&gt;
them all but useless for NMR structures. The design of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt;&lt;br /&gt;
object makes it easy to handle PDB files with more than one model&lt;br /&gt;
(see section [[#The Structure object]]). &lt;br /&gt;
&lt;br /&gt;
==== How do I download structures from the PDB? ====&lt;br /&gt;
&lt;br /&gt;
This can be done using the &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object, using the &amp;lt;code&amp;gt;retrieve_pdb_file&amp;lt;/code&amp;gt;&lt;br /&gt;
method. The argument for this method is the PDB identifier of the&lt;br /&gt;
structure.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
pdbl = PDBList()&lt;br /&gt;
pdbl.retrieve_pdb_file('1FAT')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; class can also be used as a command-line tool: &lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
python PDBList.py 1fat&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
The downloaded file will be called &amp;lt;code&amp;gt;pdb1fat.ent&amp;lt;/code&amp;gt; and stored&lt;br /&gt;
in the current working directory. Note that the &amp;lt;code&amp;gt;retrieve_pdb_file&amp;lt;/code&amp;gt;&lt;br /&gt;
method also has an optional argument &amp;lt;code&amp;gt;pdir&amp;lt;/code&amp;gt; that specifies&lt;br /&gt;
a specific directory in which to store the downloaded PDB files. &lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;retrieve_pdb_file&amp;lt;/code&amp;gt; method also has some options for specifying&lt;br /&gt;
the compression format used for the download, and the program used&lt;br /&gt;
for local decompression (default &amp;lt;code&amp;gt;.Z&amp;lt;/code&amp;gt; format and &amp;lt;code&amp;gt;gunzip&amp;lt;/code&amp;gt;).&lt;br /&gt;
In addition, the PDB ftp site can be specified upon creation of the&lt;br /&gt;
&amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object. By default, the server of the Worldwide Protein Data Bank (ftp://ftp.wwpdb.org/pub/pdb/data/structures/divided/pdb/)&lt;br /&gt;
is used. See the API documentation for more details. Thanks again&lt;br /&gt;
to Kristian Rother for donating this module.&lt;br /&gt;
&lt;br /&gt;
==== How do I download the entire PDB? ====&lt;br /&gt;
&lt;br /&gt;
The following commands will store all PDB files in the &amp;lt;code&amp;gt;/data/pdb&amp;lt;/code&amp;gt;&lt;br /&gt;
directory:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
python PDBList.py all /data/pdb&lt;br /&gt;
python PDBList.py all /data/pdb -d&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The API method for this is called &amp;lt;code&amp;gt;download_entire_pdb&amp;lt;/code&amp;gt;.&lt;br /&gt;
Adding the &amp;lt;code&amp;gt;-d&amp;lt;/code&amp;gt; option will store all files in the same directory.&lt;br /&gt;
Otherwise, they are sorted into PDB-style subdirectories according&lt;br /&gt;
to their PDB ID's. Depending on the traffic, a complete download will&lt;br /&gt;
take 2-4 days.&lt;br /&gt;
&lt;br /&gt;
==== How do I keep a local copy of the PDB up-to-date? ====&lt;br /&gt;
&lt;br /&gt;
This can also be done using the &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object. One simply&lt;br /&gt;
creates a &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object (specifying the directory where&lt;br /&gt;
the local copy of the PDB is present) and calls the &amp;lt;code&amp;gt;update_pdb&amp;lt;/code&amp;gt;&lt;br /&gt;
method:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
pl = PDBList(pdb='/data/pdb')&lt;br /&gt;
pl.update_pdb()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
One can of course make a weekly cronjob out of this to keep&lt;br /&gt;
the local copy automatically up-to-date. The PDB ftp site can also&lt;br /&gt;
be specified (see API documentation).&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; has some additional methods that can be of use. The&lt;br /&gt;
&amp;lt;code&amp;gt;get_all_obsolete&amp;lt;/code&amp;gt; method can be used to get a list of all&lt;br /&gt;
obsolete PDB entries. The &amp;lt;code&amp;gt;changed_this_week&amp;lt;/code&amp;gt; method can&lt;br /&gt;
be used to obtain the entries that were added, modified or obsoleted&lt;br /&gt;
during the current week. For more info on the possibilities of &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt;,&lt;br /&gt;
see the API documentation.&lt;br /&gt;
&lt;br /&gt;
==== What about all those buggy PDB files? ====&lt;br /&gt;
&lt;br /&gt;
It is well known that many PDB files contain semantic errors (I'm&lt;br /&gt;
not talking about the structures themselves know, but their representation&lt;br /&gt;
in PDB files). Bio.PDB tries to handle this in two ways. The PDBParser&lt;br /&gt;
object can behave in two ways: a restrictive way and a permissive&lt;br /&gt;
way (THIS IS NOW THE DEFAULT). The restrictive way used to be the&lt;br /&gt;
default, but people seemed to think that Bio.PDB 'crashed' due to&lt;br /&gt;
a bug (hah!), so I changed it. If you ever encounter a real bug, please&lt;br /&gt;
tell me immediately!&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Permissive parser&lt;br /&gt;
parser = PDBParser(PERMISSIVE=1)&lt;br /&gt;
parser = PDBParser()  # The same (default)&lt;br /&gt;
# Strict parser&lt;br /&gt;
strict_parser = PDBParser(PERMISSIVE=0)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
In the permissive state (DEFAULT), PDB files that obviously contain&lt;br /&gt;
errors are 'corrected' (i.e. some residues or atoms are left out).&lt;br /&gt;
These errors include:&lt;br /&gt;
* Multiple residues with the same identifier&lt;br /&gt;
* Multiple atoms with the same identifier (taking into account the altloc identifier)&lt;br /&gt;
These errors indicate real problems in the PDB file (for details see&lt;br /&gt;
the Bioinformatics article). In the restrictive state, PDB files with&lt;br /&gt;
errors cause an exception to occur. This is useful to find errors&lt;br /&gt;
in PDB files.&lt;br /&gt;
&lt;br /&gt;
Some errors however are automatically corrected. Normally each disordered&lt;br /&gt;
atom should have a non-blank altloc identifier. However, there are&lt;br /&gt;
many structures that do not follow this convention, and have a blank&lt;br /&gt;
and a non-blank identifier for two disordered positions of the same&lt;br /&gt;
atom. This is automatically interpreted in the right way.&lt;br /&gt;
&lt;br /&gt;
Sometimes a structure contains a list of residues belonging to chain&lt;br /&gt;
A, followed by residues belonging to chain B, and again followed by&lt;br /&gt;
residues belonging to chain A, i.e. the chains are 'broken'. This&lt;br /&gt;
is also correctly interpreted.&lt;br /&gt;
&lt;br /&gt;
==== Can I write PDB files? ====&lt;br /&gt;
&lt;br /&gt;
Use the PDBIO class for this. It's easy to write out specific parts&lt;br /&gt;
of a structure too, of course.&lt;br /&gt;
&lt;br /&gt;
Example: saving a structure&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
io = PDBIO()&lt;br /&gt;
io.set_structure(s)&lt;br /&gt;
io.save('out.pdb')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
If you want to write out a part of the structure, make use of the&lt;br /&gt;
&amp;lt;code&amp;gt;Select&amp;lt;/code&amp;gt; class (also in &amp;lt;code&amp;gt;PDBIO&amp;lt;/code&amp;gt;). Select has four methods:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
accept_model(model)&lt;br /&gt;
accept_chain(chain)&lt;br /&gt;
accept_residue(residue)&lt;br /&gt;
accept_atom(atom)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
By default, every method returns 1 (which means the model/chain/residue/atom&lt;br /&gt;
is included in the output). By subclassing &amp;lt;code&amp;gt;Select&amp;lt;/code&amp;gt; and returning&lt;br /&gt;
0 when appropriate you can exclude models, chains, etc. from the output.&lt;br /&gt;
Cumbersome maybe, but very powerful. The following code only writes&lt;br /&gt;
out glycine residues:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
class GlySelect(Select):&lt;br /&gt;
    def accept_residue(self, residue):&lt;br /&gt;
        if residue.get_name()=='GLY':&lt;br /&gt;
            return 1&lt;br /&gt;
        else:&lt;br /&gt;
            return 0&lt;br /&gt;
&lt;br /&gt;
io = PDBIO()&lt;br /&gt;
io.set_structure(s)&lt;br /&gt;
io.save('gly_only.pdb', GlySelect())&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
If this is all too complicated for you, the &amp;lt;code&amp;gt;Dice&amp;lt;/code&amp;gt; module contains&lt;br /&gt;
a handy &amp;lt;code&amp;gt;extract&amp;lt;/code&amp;gt; function that writes out all residues in&lt;br /&gt;
a chain between a start and end residue.&lt;br /&gt;
&lt;br /&gt;
==== Can I write mmCIF files? ====&lt;br /&gt;
&lt;br /&gt;
No, and I also don't have plans to add that functionality soon (or&lt;br /&gt;
ever - I don't need it at all, and it's a lot of work, plus no-one&lt;br /&gt;
has ever asked for it). People who want to add this can contact me.&lt;br /&gt;
&lt;br /&gt;
=== The Structure object ===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==== What's the overall layout of a Structure object? ====&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object follows the so-called '''SMCRA'''&lt;br /&gt;
(Structure/Model/Chain/Residue/Atom) architecture : &lt;br /&gt;
* A structure consists of models&lt;br /&gt;
* A model consists of chains&lt;br /&gt;
* A chain consists of residues&lt;br /&gt;
* A residue consists of atoms&lt;br /&gt;
This is the way many structural biologists/bioinformaticians think&lt;br /&gt;
about structure, and provides a simple but efficient way to deal with&lt;br /&gt;
structure. Additional stuff is essentially added when needed. A UML&lt;br /&gt;
diagram of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object (forgetting about the &amp;lt;code&amp;gt;Disordered&amp;lt;/code&amp;gt;&lt;br /&gt;
classes for now) is shown in the figure below.&lt;br /&gt;
&lt;br /&gt;
[[Image:Smcra.png|600px|left|frame|Diagram of SMCRA architecture of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object. Full lines with diamonds denote aggregation, full lines with arrows denote referencing, full lines with triangles denote inheritance and dashed lines with triangles denote interface realization.]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br clear=all&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I navigate through a Structure object? ====&lt;br /&gt;
&lt;br /&gt;
The following code iterates through all atoms of a structure:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
p = PDBParser()&lt;br /&gt;
structure = p.get_structure('X', 'pdb1fat.ent')&lt;br /&gt;
for model in structure:&lt;br /&gt;
    for chain in model:&lt;br /&gt;
        for residue in chain:&lt;br /&gt;
            for atom in residue:&lt;br /&gt;
                print atom&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
There are also some shortcuts:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Iterate over all atoms in a structure&lt;br /&gt;
for atom in structure.get_atoms():&lt;br /&gt;
    print atom&lt;br /&gt;
&lt;br /&gt;
# Iterate over all residues in a model&lt;br /&gt;
for residue in model.get_residues():&lt;br /&gt;
    print residue&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Structures, models, chains, residues and atoms are called '''Entities'''&lt;br /&gt;
in Biopython. You can always get a parent &amp;lt;code&amp;gt;Entity&amp;lt;/code&amp;gt; from a child&lt;br /&gt;
&amp;lt;code&amp;gt;Entity&amp;lt;/code&amp;gt;, e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
residue = atom.get_parent()&lt;br /&gt;
chain = residue.get_parent()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
You can also test whether an &amp;lt;code&amp;gt;Entity&amp;lt;/code&amp;gt; has a certain child using&lt;br /&gt;
the &amp;lt;code&amp;gt;has_id&amp;lt;/code&amp;gt; method.&lt;br /&gt;
&lt;br /&gt;
==== Can I do that a bit more conveniently? ====&lt;br /&gt;
&lt;br /&gt;
You can do things like:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
atoms = structure.get_atoms()&lt;br /&gt;
residues = structure.get_residues()&lt;br /&gt;
atoms = chain.get_atoms()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
You can also use the &amp;lt;code&amp;gt;Selection.unfold_entities&amp;lt;/code&amp;gt; function:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Get all residues from a structure&lt;br /&gt;
res_list = Selection.unfold_entities(structure, 'R')&lt;br /&gt;
# Get all atoms from a chain&lt;br /&gt;
atom_list = Selection.unfold_entities(chain, 'A')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Obviously, &amp;lt;code&amp;gt;A&amp;lt;/code&amp;gt;=atom, &amp;lt;code&amp;gt;R&amp;lt;/code&amp;gt;=residue, &amp;lt;code&amp;gt;C&amp;lt;/code&amp;gt;=chain, &amp;lt;code&amp;gt;M&amp;lt;/code&amp;gt;=model, &amp;lt;code&amp;gt;S&amp;lt;/code&amp;gt;=structure.&lt;br /&gt;
You can use this to go up in the hierarchy, e.g. to get a list of (unique) &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;Chain&amp;lt;/code&amp;gt; parents from a list of&lt;br /&gt;
&amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;s:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
residue_list = Selection.unfold_entities(atom_list, 'R')&lt;br /&gt;
chain_list = Selection.unfold_entities(atom_list, 'C')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
For more info, see the API documentation.&lt;br /&gt;
&lt;br /&gt;
==== How do I extract a specific &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;Chain&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;Model&amp;lt;/code&amp;gt; from a &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt;? ====&lt;br /&gt;
&lt;br /&gt;
Easy. Here are some examples:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
model = structure[0]&lt;br /&gt;
chain = model['A']&lt;br /&gt;
residue = chain[100]&lt;br /&gt;
atom = residue['CA']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Note that you can use a shortcut:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
atom = structure[0]['A'][100]['CA']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== What is a model id? ====&lt;br /&gt;
&lt;br /&gt;
The model id is an integer which denotes the rank of the model in&lt;br /&gt;
the PDB/mmCIF file. The model is starts at 0. Crystal structures generally&lt;br /&gt;
have only one model (with id 0), while NMR files usually have several&lt;br /&gt;
models.&lt;br /&gt;
&lt;br /&gt;
==== What is a chain id? ====&lt;br /&gt;
&lt;br /&gt;
The chain id is specified in the PDB/mmCIF file, and is a single character&lt;br /&gt;
(typically a letter). &lt;br /&gt;
&lt;br /&gt;
==== What is a residue id? ====&lt;br /&gt;
&lt;br /&gt;
This is a bit more complicated, due to the clumsy PDB format. A residue&lt;br /&gt;
id is a tuple with three elements:&lt;br /&gt;
* The '''hetero-flag''': this is &amp;lt;code&amp;gt;'H_'&amp;lt;/code&amp;gt; plus the name of the hetero-residue (e.g. &amp;lt;code&amp;gt;'H_GLC'&amp;lt;/code&amp;gt; in the case of a glucose molecule), or &amp;lt;code&amp;gt;'W'&amp;lt;/code&amp;gt; in the case of a water molecule.&lt;br /&gt;
* The '''sequence identifier''' in the chain, e.g. 100&lt;br /&gt;
* The '''insertion code''', e.g. &amp;lt;code&amp;gt;'A'&amp;lt;/code&amp;gt;. The insertion code is sometimes used to preserve a certain desirable residue numbering scheme. A Ser 80 insertion mutant (inserted e.g. between a Thr 80 and an Asn 81 residue) could e.g. have sequence identifiers and insertion codes as follows: Thr 80 A, Ser 80 B, Asn 81. In this way the residue numbering scheme stays in tune with that of the wild type structure.&lt;br /&gt;
&lt;br /&gt;
The id of the above glucose residue would thus be &amp;lt;code&amp;gt;('H_GLC', 100, 'A')&amp;lt;/code&amp;gt;. If the hetero-flag and insertion code are blank, the sequence&lt;br /&gt;
identifier alone can be used:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Full id&lt;br /&gt;
residue = chain[(' ', 100, ' ')]&lt;br /&gt;
# Shortcut id&lt;br /&gt;
residue = chain[100]&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
The reason for the hetero-flag is that many, many PDB files use the same sequence identifier for an amino acid and a hetero-residue or a water, which would create obvious problems if the hetero-flag was not used.&lt;br /&gt;
&lt;br /&gt;
==== What is an atom id? ====&lt;br /&gt;
&lt;br /&gt;
The atom id is simply the atom name (eg. &amp;lt;code&amp;gt;'CA'&amp;lt;/code&amp;gt;). In practice,&lt;br /&gt;
the atom name is created by stripping all spaces from the atom name&lt;br /&gt;
in the PDB file. &lt;br /&gt;
&lt;br /&gt;
However, in PDB files, a space can be part of an atom name. Often,&lt;br /&gt;
calcium atoms are called &amp;lt;code&amp;gt;'CA..'&amp;lt;/code&amp;gt; in order to distinguish them&lt;br /&gt;
from C&amp;amp;alpha; atoms (which are called &amp;lt;code&amp;gt;'.CA.'&amp;lt;/code&amp;gt;). In cases&lt;br /&gt;
were stripping the spaces would create problems (ie. two atoms called&lt;br /&gt;
&amp;lt;code&amp;gt;'CA'&amp;lt;/code&amp;gt; in the same residue) the spaces are kept.&lt;br /&gt;
&lt;br /&gt;
==== How is disorder handled? ====&lt;br /&gt;
&lt;br /&gt;
This is one of the strong points of Bio.PDB. It can handle both disordered&lt;br /&gt;
atoms and point mutations (ie. a Gly and an Ala residue in the same&lt;br /&gt;
position). &lt;br /&gt;
&lt;br /&gt;
Disorder should be dealt with from two points of view: the atom and&lt;br /&gt;
the residue points of view. In general, I have tried to encapsulate&lt;br /&gt;
all the complexity that arises from disorder. If you just want to&lt;br /&gt;
loop over all C&amp;amp;alpha; atoms, you do not care that some residues&lt;br /&gt;
have a disordered side chain. On the other hand it should also be&lt;br /&gt;
possible to represent disorder completely in the data structure. Therefore,&lt;br /&gt;
disordered atoms or residues are stored in special objects that behave&lt;br /&gt;
as if there is no disorder. This is done by only representing a subset&lt;br /&gt;
of the disordered atoms or residues. Which subset is picked (e.g.&lt;br /&gt;
which of the two disordered OG side chain atom positions of a Ser&lt;br /&gt;
residue is used) can be specified by the user.&lt;br /&gt;
&lt;br /&gt;
'''Disordered atom positions''' are represented by ordinary &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;&lt;br /&gt;
objects, but all &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects that represent the same physical&lt;br /&gt;
atom are stored in a &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; object (see section [[#The Structure object]]).&lt;br /&gt;
Each &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object in a &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; object can&lt;br /&gt;
be uniquely indexed using its altloc specifier. The &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt;&lt;br /&gt;
object forwards all uncaught method calls to the selected Atom object,&lt;br /&gt;
by default the one that represents the atom with the highest&lt;br /&gt;
occupancy. The user can of course change the selected &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;&lt;br /&gt;
object, making use of its altloc specifier. In this way atom disorder&lt;br /&gt;
is represented correctly without much additional complexity. In other&lt;br /&gt;
words, if you are not interested in atom disorder, you will not be&lt;br /&gt;
bothered by it.&lt;br /&gt;
&lt;br /&gt;
Each disordered atom has a characteristic altloc identifier. You can&lt;br /&gt;
specify that a &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; object should behave like&lt;br /&gt;
the &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object associated with a specific altloc identifier:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
atom.disordered_select('A')  # select altloc A atom&lt;br /&gt;
atom.disordered_select('B')  # select altloc B atom &lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
A special case arises when disorder is due to '''point mutations''',&lt;br /&gt;
i.e. when two or more point mutants of a polypeptide are present in&lt;br /&gt;
the crystal. An example of this can be found in PDB structure 1EN2.&lt;br /&gt;
&lt;br /&gt;
Since these residues belong to a different residue type (e.g. let's&lt;br /&gt;
say Ser 60 and Cys 60) they should not be stored in a single &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt;&lt;br /&gt;
object as in the common case. In this case, each residue is represented&lt;br /&gt;
by one &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object, and both &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; objects&lt;br /&gt;
are stored in a single &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt; object (see section [[#The Structure object]]).&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt; object forwards all uncaught&lt;br /&gt;
methods to the selected &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object (by default the last&lt;br /&gt;
&amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object added), and thus behaves like an ordinary&lt;br /&gt;
residue. Each &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object in a &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt;&lt;br /&gt;
object can be uniquely identified by its residue name. In the above&lt;br /&gt;
example, residue Ser 60 would have id 'SER' in the &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt;&lt;br /&gt;
object, while residue Cys 60 would have id 'CYS'. The user can select&lt;br /&gt;
the active &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object in a &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt;&lt;br /&gt;
object via this id.&lt;br /&gt;
&lt;br /&gt;
Example: suppose that a chain has a point mutation at position 10,&lt;br /&gt;
consisting of a Ser and a Cys residue. Make sure that residue 10 of&lt;br /&gt;
this chain behaves as the Cys residue.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
residue = chain[10]&lt;br /&gt;
residue.disordered_select('CYS')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
In addition, you can get a list of all &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects (ie.&lt;br /&gt;
all &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; objects are 'unpacked' to their individual&lt;br /&gt;
&amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects) using the &amp;lt;code&amp;gt;get_unpacked_list&amp;lt;/code&amp;gt; method&lt;br /&gt;
of a (&amp;lt;code&amp;gt;Disordered&amp;lt;/code&amp;gt;)&amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object.&lt;br /&gt;
&lt;br /&gt;
==== Can I sort residues in a chain somehow? ====&lt;br /&gt;
&lt;br /&gt;
Yes, kinda, but I'm waiting for a request for this feature to finish&lt;br /&gt;
it :-).&lt;br /&gt;
&lt;br /&gt;
==== How are ligands and solvent handled? ====&lt;br /&gt;
&lt;br /&gt;
See 'What is a residue id?'.&lt;br /&gt;
&lt;br /&gt;
==== What about B factors? ====&lt;br /&gt;
&lt;br /&gt;
Well, yes! Bio.PDB supports isotropic and anisotropic B factors, and&lt;br /&gt;
also deals with standard deviations of anisotropic B factor if present&lt;br /&gt;
(see the section [[#Analysis]]).&lt;br /&gt;
&lt;br /&gt;
==== What about standard deviation of atomic positions? ====&lt;br /&gt;
&lt;br /&gt;
Yup, supported. See the section [[#Analysis]].&lt;br /&gt;
&lt;br /&gt;
==== I think the SMCRA data structure is not flexible/sexy/whatever enough... ====&lt;br /&gt;
&lt;br /&gt;
Sure, sure. Everybody is always coming up with (mostly vaporware or&lt;br /&gt;
partly implemented) data structures that handle all possible situations&lt;br /&gt;
and are extensible in all thinkable (and unthinkable) ways. The prosaic&lt;br /&gt;
truth however is that 99.9\% of people using (and I mean really using!)&lt;br /&gt;
crystal structures think in terms of models, chains, residues and&lt;br /&gt;
atoms. The philosophy of Bio.PDB is to provide a reasonably fast,&lt;br /&gt;
clean, simple, but complete data structure to access structure data.&lt;br /&gt;
The proof of the pudding is in the eating.&lt;br /&gt;
&lt;br /&gt;
Moreover, it is quite easy to build more specialised data structures&lt;br /&gt;
on top of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; class (eg. there's a &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt;&lt;br /&gt;
class). On the other hand, the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object is built&lt;br /&gt;
using a Parser/Consumer approach (called &amp;lt;code&amp;gt;PDBParser&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;MMCIFParser&amp;lt;/code&amp;gt;&lt;br /&gt;
and &amp;lt;code&amp;gt;StructureBuilder&amp;lt;/code&amp;gt;, respectively). One can easily reuse&lt;br /&gt;
the PDB/mmCIF parsers by implementing a specialised &amp;lt;code&amp;gt;StructureBuilder&amp;lt;/code&amp;gt;&lt;br /&gt;
class. It is of course also trivial to add support for new file formats&lt;br /&gt;
by writing new parsers.&lt;br /&gt;
&lt;br /&gt;
=== Analysis ===&lt;br /&gt;
&lt;br /&gt;
==== How do I extract information from an &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object? ====&lt;br /&gt;
&lt;br /&gt;
Using the following methods:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
a.get_name()           # atom name (spaces stripped, e.g. 'CA')&lt;br /&gt;
a.get_id()             # id (equals atom name)&lt;br /&gt;
a.get_coord()          # atomic coordinates&lt;br /&gt;
a.get_vector()         # atomic coordinates as Vector object&lt;br /&gt;
a.get_bfactor()        # isotropic B factor&lt;br /&gt;
a.get_occupancy()      # occupancy&lt;br /&gt;
a.get_altloc()         # alternative location specifier&lt;br /&gt;
a.get_sigatm()         # std. dev. of atomic parameters&lt;br /&gt;
a.get_siguij()         # std. dev. of anisotropic B factor&lt;br /&gt;
a.get_anisou()         # anisotropic B factor&lt;br /&gt;
a.get_fullname()       # atom name (with spaces, e.g. '.CA.')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I extract information from a &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object? ====&lt;br /&gt;
&lt;br /&gt;
Using the following methods:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
r.get_resname()         # return the residue name (eg. 'GLY')&lt;br /&gt;
r.is_disordered()       # 1 if the residue has disordered atoms&lt;br /&gt;
r.get_segid()           # return the SEGID&lt;br /&gt;
r.has_id(name)          # test if a residue has a certain atom&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I measure distances? ====&lt;br /&gt;
&lt;br /&gt;
That's simple: the minus operator for atoms has been overloaded to&lt;br /&gt;
return the distance between two atoms. &lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Get some atoms&lt;br /&gt;
ca1 = residue1['CA']&lt;br /&gt;
ca2 = residue2['CA']&lt;br /&gt;
# Simply subtract the atoms to get their distance&lt;br /&gt;
distance = ca1-ca2&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I measure angles? ====&lt;br /&gt;
&lt;br /&gt;
This can easily be done via the vector representation of the atomic coordinates, and the &amp;lt;code&amp;gt;calc_angle&amp;lt;/code&amp;gt; function from the &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; module:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
vector1 = atom1.get_vector()&lt;br /&gt;
vector2 = atom2.get_vector()&lt;br /&gt;
vector3 = atom3.get_vector()&lt;br /&gt;
angle = calc_angle(vector1, vector2, vector3)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I measure torsion angles? ====&lt;br /&gt;
&lt;br /&gt;
Again, this can easily be done via the vector representation of the&lt;br /&gt;
atomic coordinates, this time using the &amp;lt;code&amp;gt;calc_dihedral&amp;lt;/code&amp;gt; function&lt;br /&gt;
from the &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; module:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
vector1 = atom1.get_vector()&lt;br /&gt;
vector2 = atom2.get_vector()&lt;br /&gt;
vector3 = atom3.get_vector()&lt;br /&gt;
vector4 = atom4.get_vector()&lt;br /&gt;
angle = calc_dihedral(vector1, vector2, vector3, vector4)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I determine atom-atom contacts? ====&lt;br /&gt;
&lt;br /&gt;
Use &amp;lt;code&amp;gt;NeighborSearch&amp;lt;/code&amp;gt;. This uses a KD tree data structure coded&lt;br /&gt;
in C behind the screens, so it's pretty darn fast (see &amp;lt;code&amp;gt;Bio.KDTree&amp;lt;/code&amp;gt;).&lt;br /&gt;
&lt;br /&gt;
==== How do I extract polypeptides from a &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object? ====&lt;br /&gt;
&lt;br /&gt;
Use &amp;lt;code&amp;gt;PolypeptideBuilder&amp;lt;/code&amp;gt;. You can use the resulting &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt;&lt;br /&gt;
object to get the sequence as a &amp;lt;code&amp;gt;Seq&amp;lt;/code&amp;gt; object or to get a list&lt;br /&gt;
of C&amp;amp;alpha; atoms as well. Polypeptides can be built using a C-N&lt;br /&gt;
or a C&amp;amp;alpha;-C&amp;amp;alpha; distance criterion.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Using C-N &lt;br /&gt;
ppb=PPBuilder()&lt;br /&gt;
for pp in ppb.build_peptides(structure): &lt;br /&gt;
    print pp.get_sequence()&lt;br /&gt;
&lt;br /&gt;
# Using CA-CA&lt;br /&gt;
ppb=CaPPBuilder()&lt;br /&gt;
for pp in ppb.build_peptides(structure): &lt;br /&gt;
    print pp.get_sequence()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that in the above case only model 0 of the structure is considered&lt;br /&gt;
by &amp;lt;code&amp;gt;PolypeptideBuilder&amp;lt;/code&amp;gt;. However, it is possible to use &amp;lt;code&amp;gt;PolypeptideBuilder&amp;lt;/code&amp;gt;&lt;br /&gt;
to build &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt; objects from &amp;lt;code&amp;gt;Model&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;Chain&amp;lt;/code&amp;gt;&lt;br /&gt;
objects as well.&lt;br /&gt;
&lt;br /&gt;
==== How do I get the sequence of a structure? ====&lt;br /&gt;
&lt;br /&gt;
The first thing to do is to extract all polypeptides from the structure&lt;br /&gt;
(see previous entry). The sequence of each polypeptide can then easily&lt;br /&gt;
be obtained from the &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt; objects. The sequence is&lt;br /&gt;
represented as a Biopython &amp;lt;code&amp;gt;Seq&amp;lt;/code&amp;gt; object, and its alphabet is&lt;br /&gt;
defined by a &amp;lt;code&amp;gt;ProteinAlphabet&amp;lt;/code&amp;gt; object.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; seq = polypeptide.get_sequence()&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print seq&lt;br /&gt;
Seq('SNVVE...', &amp;lt;class Bio.Alphabet.ProteinAlphabet&amp;gt;)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I determine secondary structure? ====&lt;br /&gt;
&lt;br /&gt;
For this functionality, you need to install DSSP (and obtain a license&lt;br /&gt;
for it - free for academic use, see http://www.cmbi.kun.nl/gv/dssp/).&lt;br /&gt;
Then use the &amp;lt;code&amp;gt;DSSP&amp;lt;/code&amp;gt; class, which maps &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; objects&lt;br /&gt;
to their secondary structure (and accessible surface area). The DSSP&lt;br /&gt;
codes are listed in Table \ref{cap:DSSP-codes. Note that DSSP (the&lt;br /&gt;
program, and thus by consequence the class) cannot handle multiple&lt;br /&gt;
models!&lt;br /&gt;
&lt;br /&gt;
%&lt;br /&gt;
\begin{table&lt;br /&gt;
&lt;br /&gt;
==== \begin{tabular{|c|c|&lt;br /&gt;
\hline &lt;br /&gt;
Code&amp;amp;&lt;br /&gt;
Secondary structure\tabularnewline&lt;br /&gt;
\hline&lt;br /&gt;
\hline &lt;br /&gt;
H&amp;amp;&lt;br /&gt;
&amp;amp;alpha;-helix\tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
B&amp;amp;&lt;br /&gt;
Isolated &amp;amp;beta;-bridge residue\tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
E&amp;amp;&lt;br /&gt;
Strand \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
G&amp;amp;&lt;br /&gt;
3-10 helix \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
I&amp;amp;&lt;br /&gt;
$\Pi$-helix \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
T&amp;amp;&lt;br /&gt;
Turn\tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
S&amp;amp;&lt;br /&gt;
Bend \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
-&amp;amp;&lt;br /&gt;
Other\tabularnewline&lt;br /&gt;
\hline&lt;br /&gt;
\end{tabular&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
\caption{\label{cap:DSSP-codesDSSP codes in Bio.PDB.&lt;br /&gt;
\end{table&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==== How do I calculate the accessible surface area of a residue? ====&lt;br /&gt;
&lt;br /&gt;
Use the &amp;lt;code&amp;gt;DSSP&amp;lt;/code&amp;gt; class (see also previous entry). But see also&lt;br /&gt;
next entry.&lt;br /&gt;
&lt;br /&gt;
==== How do I calculate residue depth? ====&lt;br /&gt;
&lt;br /&gt;
Residue depth is the average distance of a residue's atoms from the&lt;br /&gt;
solvent accessible surface. It's a fairly new and very powerful parameterization&lt;br /&gt;
of solvent accessibility. For this functionality, you need to install&lt;br /&gt;
Michel Sanner's MSMS program (http://www.scripps.edu/pub/olson-web/people/sanner/html/msms_home.html).&lt;br /&gt;
Then use the &amp;lt;code&amp;gt;ResidueDepth&amp;lt;/code&amp;gt; class. This class behaves as a&lt;br /&gt;
dictionary which maps &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; objects to corresponding (residue&lt;br /&gt;
depth, C&amp;amp;alpha; depth) tuples. The C&amp;amp;alpha; depth is the distance&lt;br /&gt;
of a residue's C&amp;amp;alpha; atom to the solvent accessible surface. &lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
model = structure[0]&lt;br /&gt;
rd = ResidueDepth(model, pdb_file)&lt;br /&gt;
residue_depth, ca_depth = rd[some_residue]&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
You can also get access to the molecular surface itself (via the &amp;lt;code&amp;gt;get_surface&amp;lt;/code&amp;gt;&lt;br /&gt;
function), in the form of a Numeric python array with the surface points.&lt;br /&gt;
&lt;br /&gt;
==== How do I calculate Half Sphere Exposure? ====&lt;br /&gt;
&lt;br /&gt;
Half Sphere Exposure (HSE) is a new, 2D measure of solvent exposure.&lt;br /&gt;
Basically, it counts the number of C&amp;amp;alpha; atoms around a residue&lt;br /&gt;
in the direction of its side chain, and in the opposite direction&lt;br /&gt;
(within a radius of 13 Å). Despite its simplicity, it outperforms&lt;br /&gt;
many other measures of solvent exposure. An article describing this&lt;br /&gt;
novel 2D measure has been submitted.&lt;br /&gt;
&lt;br /&gt;
HSE comes in two flavors: HSE&amp;amp;alpha; and HSE&amp;amp;beta;. The former&lt;br /&gt;
only uses the C&amp;amp;alpha; atom positions, while the latter uses the&lt;br /&gt;
C&amp;amp;alpha; and C&amp;amp;beta; atom positions. The HSE measure is calculated&lt;br /&gt;
by the &amp;lt;code&amp;gt;HSExposure&amp;lt;/code&amp;gt; class, which can also calculate the contact&lt;br /&gt;
number. The latter class has methods which return dictionaries that&lt;br /&gt;
map a &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object to its corresponding HSE&amp;amp;alpha;, HSE&amp;amp;beta;&lt;br /&gt;
and contact number values.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
model = structure[0]&lt;br /&gt;
hse = HSExposure()&lt;br /&gt;
# Calculate HSEalpha&lt;br /&gt;
exp_ca = hse.calc_hs_exposure(model, option='CA3')&lt;br /&gt;
# Calculate HSEbeta&lt;br /&gt;
exp_cb = hse.calc_hs_exposure(model, option='CB')&lt;br /&gt;
# Calculate classical coordination number&lt;br /&gt;
exp_fs = hse.calc_fs_exposure(model)&lt;br /&gt;
# Print HSEalpha for a residue&lt;br /&gt;
print exp_ca[some_residue]&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I map the residues of two related structures onto each other? ====&lt;br /&gt;
&lt;br /&gt;
First, create an alignment file in FASTA format, then use the &amp;lt;code&amp;gt;StructureAlignment&amp;lt;/code&amp;gt;&lt;br /&gt;
class. This class can also be used for alignments with more than two&lt;br /&gt;
structures.&lt;br /&gt;
&lt;br /&gt;
==== How do I test if a Residue object is an amino acid? ====&lt;br /&gt;
&lt;br /&gt;
Use &amp;lt;code&amp;gt;is_aa(residue)&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
==== Can I do vector operations on atomic coordinates? ====&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects return a &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; object representation&lt;br /&gt;
of the coordinates with the &amp;lt;code&amp;gt;get_vector&amp;lt;/code&amp;gt; method. &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt;&lt;br /&gt;
implements the full set of 3D vector operations, matrix multiplication&lt;br /&gt;
(left and right) and some advanced rotation-related operations as&lt;br /&gt;
well. See also next question.&lt;br /&gt;
&lt;br /&gt;
==== How do I put a virtual C&amp;amp;beta; on a Gly residue? ====&lt;br /&gt;
&lt;br /&gt;
OK, I admit, this example is only present to show off the possibilities&lt;br /&gt;
of Bio.PDB's &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; module (though this code is actually&lt;br /&gt;
used in the &amp;lt;code&amp;gt;HSExposure&amp;lt;/code&amp;gt; module, which contains a novel way&lt;br /&gt;
to parametrize residue exposure - publication underway). Suppose that&lt;br /&gt;
you would like to find the position of a Gly residue's C&amp;amp;beta; atom,&lt;br /&gt;
if it had one. How would you do that? Well, rotating the N atom of&lt;br /&gt;
the Gly residue along the C&amp;amp;alpha;-C bond over -120 degrees roughly&lt;br /&gt;
puts it in the position of a virtual C&amp;amp;beta; atom. Here's how to&lt;br /&gt;
do it, making use of the &amp;lt;code&amp;gt;rotaxis&amp;lt;/code&amp;gt; method (which can be used&lt;br /&gt;
to construct a rotation around a certain axis) of the &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt;&lt;br /&gt;
module:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# get atom coordinates as vectors&lt;br /&gt;
n = residue['N'].get_vector() &lt;br /&gt;
c = residue['C'].get_vector() &lt;br /&gt;
ca = residue['CA'].get_vector()&lt;br /&gt;
# center at origin&lt;br /&gt;
n = n - ca&lt;br /&gt;
c = c - ca&lt;br /&gt;
# find rotation matrix that rotates n -120 degrees along the ca-c vector&lt;br /&gt;
rot = rotaxis(-pi{*120.0/180.0, c)&lt;br /&gt;
# apply rotation to ca-n vector&lt;br /&gt;
cb_at_origin = n.left_multiply(rot)&lt;br /&gt;
# put on top of ca atom&lt;br /&gt;
cb = cb_at_origin + ca&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
This example shows that it's possible to do some quite nontrivial&lt;br /&gt;
vector operations on atomic data, which can be quite useful. In addition&lt;br /&gt;
to all the usual vector operations (cross (use &amp;lt;code&amp;gt;**&amp;lt;/code&amp;gt;), and&lt;br /&gt;
dot (use &amp;lt;code&amp;gt;*&amp;lt;/code&amp;gt;) product, angle, norm, etc.) and the above mentioned&lt;br /&gt;
&amp;lt;code&amp;gt;rotaxis function, the &amp;lt;code&amp;gt;Vector module also has methods&lt;br /&gt;
to rotate (&amp;lt;code&amp;gt;rotmat&amp;lt;/code&amp;gt;) or reflect (&amp;lt;code&amp;gt;refmat&amp;lt;/code&amp;gt;) one vector&lt;br /&gt;
on top of another.&lt;br /&gt;
&lt;br /&gt;
=== Manipulating the structure ===&lt;br /&gt;
&lt;br /&gt;
==== How do I superimpose two structures? ====&lt;br /&gt;
&lt;br /&gt;
Surprisingly, this is done using the &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; object.&lt;br /&gt;
This object calculates the rotation and translation matrix that rotates&lt;br /&gt;
two lists of atoms on top of each other in such a way that their RMSD&lt;br /&gt;
is minimized. Of course, the two lists need to contain the same amount&lt;br /&gt;
of atoms. The &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; object can also apply the rotation/translation&lt;br /&gt;
to a list of atoms. The rotation and translation are stored as a tuple&lt;br /&gt;
in the &amp;lt;code&amp;gt;rotran&amp;lt;/code&amp;gt; attribute of the &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; object&lt;br /&gt;
(note that the rotation is right multiplying!). The RMSD is stored&lt;br /&gt;
in the &amp;lt;code&amp;gt;rmsd&amp;lt;/code&amp;gt; attribute.&lt;br /&gt;
&lt;br /&gt;
The algorithm used by &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; comes from \textit{Matrix&lt;br /&gt;
computations, 2nd ed. Golub, G. \&amp;amp; Van Loan (1989) and makes use&lt;br /&gt;
of singular value decomposition (this is implemented in the general&lt;br /&gt;
&amp;lt;code&amp;gt;Bio.SVDSuperimposer&amp;lt;/code&amp;gt; module).&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
sup = Superimposer()&lt;br /&gt;
# Specify the atom lists&lt;br /&gt;
# 'fixed' and 'moving' are lists of Atom objects&lt;br /&gt;
# The moving atoms will be put on the fixed atoms&lt;br /&gt;
sup.set_atoms(fixed, moving)&lt;br /&gt;
# Print rotation/translation/rmsd&lt;br /&gt;
print sup.rotran&lt;br /&gt;
print sup.rms &lt;br /&gt;
# Apply rotation/translation to the moving atoms&lt;br /&gt;
sup.apply(moving)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I superimpose two structures based on their active sites? ====&lt;br /&gt;
&lt;br /&gt;
Pretty easily. Use the active site atoms to calculate the rotation/translation&lt;br /&gt;
matrices (see above), and apply these to the whole molecule.&lt;br /&gt;
&lt;br /&gt;
==== Can I manipulate the atomic coordinates? ====&lt;br /&gt;
&lt;br /&gt;
Yes, using the &amp;lt;code&amp;gt;transform&amp;lt;/code&amp;gt; method of the &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object,&lt;br /&gt;
or directly using the &amp;lt;code&amp;gt;set_coord&amp;lt;/code&amp;gt; method.&lt;br /&gt;
&lt;br /&gt;
== Other Structural Bioinformatics modules ==&lt;br /&gt;
&lt;br /&gt;
==== Bio.SCOP ====&lt;br /&gt;
&lt;br /&gt;
See the main Biopython tutorial.&lt;br /&gt;
&lt;br /&gt;
==== Bio.FSSP ====&lt;br /&gt;
&lt;br /&gt;
No documentation available yet.&lt;br /&gt;
&lt;br /&gt;
== You haven't answered my question yet! ==&lt;br /&gt;
&lt;br /&gt;
Woah! It's late and I'm tired, and a glass of excellent ''Pedro Ximenez'' sherry is waiting for me. Just drop me a mail, and I'll answer&lt;br /&gt;
you in the morning (with a bit of luck...).&lt;br /&gt;
&lt;br /&gt;
== Contributors ==&lt;br /&gt;
&lt;br /&gt;
The main author/maintainer of Bio.PDB is:&lt;br /&gt;
&lt;br /&gt;
 Thomas Hamelryck&lt;br /&gt;
 Bioinformatics center&lt;br /&gt;
 Institute of Molecular Biology&lt;br /&gt;
 University of Copenhagen&lt;br /&gt;
 Universitetsparken 15, Bygning 10&lt;br /&gt;
 DK-2100 København Ø&lt;br /&gt;
 Denmark&lt;br /&gt;
 thamelry@binf.ku.dk&lt;br /&gt;
&lt;br /&gt;
Kristian Rother donated code to interact with the PDB database, and to parse the PDB&lt;br /&gt;
header. Indraneel Majumdar sent in some bug reports and assisted in&lt;br /&gt;
coding the &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt; module. Many thanks to Brad Chapman,&lt;br /&gt;
Jeffrey Chang, Andrew Dalke and Iddo Friedberg for suggestions, comments,&lt;br /&gt;
help and/or biting criticism :-).&lt;br /&gt;
&lt;br /&gt;
== Can I contribute? ==&lt;br /&gt;
&lt;br /&gt;
Yes, yes, yes! Just send an e-mail to [mailto:thamelry@binf.ku.dk Thomas Hamelryck] or to [mailto:biopython-dev@biopython.org the Biopython developers] if you have something useful to contribute! Eternal fame awaits!&lt;/div&gt;</summary>
		<author><name>Mdehoon</name></author>	</entry>

	<entry>
		<id>http://www.biopython.org/wiki/The_Biopython_Structural_Bioinformatics_FAQ</id>
		<title>The Biopython Structural Bioinformatics FAQ</title>
		<link rel="alternate" type="text/html" href="http://www.biopython.org/wiki/The_Biopython_Structural_Bioinformatics_FAQ"/>
				<updated>2013-03-26T08:41:11Z</updated>
		
		<summary type="html">&lt;p&gt;Mdehoon: /* How do I download the entire PDB? */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Introduction ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB is a biopython module that focuses on working with crystal&lt;br /&gt;
structures of biological macromolecules. This document gives a fairly&lt;br /&gt;
complete overview of Bio.PDB.&lt;br /&gt;
&lt;br /&gt;
== Bio.PDB's installation ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB is automatically installed as part of Biopython. Biopython&lt;br /&gt;
can be obtained from http://www.biopython.org. It runs on many&lt;br /&gt;
platforms (Linux/Unix, windows, Mac,...).&lt;br /&gt;
&lt;br /&gt;
== Who's using Bio.PDB? ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB was used in the construction of DISEMBL, a web server that&lt;br /&gt;
predicts disordered regions in proteins (http://dis.embl.de/),&lt;br /&gt;
and COLUMBA, a website that provides annotated protein structures&lt;br /&gt;
(http://www.columba-db.de/). Bio.PDB has also been used to&lt;br /&gt;
perform a large scale search for active sites similarities between&lt;br /&gt;
protein structures in the PDB (see [http://dx.doi.org/10.1002/prot.10338 ''Proteins'' '''51''': 96&amp;amp;ndash;108 (2003)]),&lt;br /&gt;
and to develop a new algorithm that identifies linear secondary structure elements (see [http://www.biomedcentral.com/1471-2105/6/202 ''BMC Bioinformatics'' '''6''': 202 (2005)]).&lt;br /&gt;
&lt;br /&gt;
Judging from requests for features and information, Bio.PDB is also&lt;br /&gt;
used by several LPCs (Large Pharmaceutical Companies :-).&lt;br /&gt;
&lt;br /&gt;
== Is there a Bio.PDB reference? ==&lt;br /&gt;
&lt;br /&gt;
Yes, and I'd appreciate it if you would refer to Bio.PDB in publications&lt;br /&gt;
if you make use of it. The reference is:&lt;br /&gt;
&lt;br /&gt;
\begin{quote&lt;br /&gt;
Hamelryck, T., Manderick, B. (2003) PDB parser and structure class&lt;br /&gt;
implemented in Python. \textit{Bioinformatics, \textbf{19, 2308-2310. &lt;br /&gt;
\end{quote&lt;br /&gt;
The article can be freely downloaded via the Bioinformatics journal&lt;br /&gt;
website (http://www.binf.ku.dk/users/thamelry/references.html).&lt;br /&gt;
I welcome e-mails telling me what you are using Bio.PDB for. Feature&lt;br /&gt;
requests are welcome too.&lt;br /&gt;
&lt;br /&gt;
== How well tested is Bio.PDB? ==&lt;br /&gt;
&lt;br /&gt;
Pretty well, actually. Bio.PDB has been extensively tested on nearly&lt;br /&gt;
5500 structures from the PDB - all structures seemed to be parsed&lt;br /&gt;
correctly. More details can be found in the Bio.PDB Bioinformatics&lt;br /&gt;
article. Bio.PDB has been used/is being used in many research projects&lt;br /&gt;
as a reliable tool. In fact, I'm using Bio.PDB almost daily for research&lt;br /&gt;
purposes and continue working on improving it and adding new features.&lt;br /&gt;
&lt;br /&gt;
== How fast is it? ==&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;PDBParser&amp;lt;/code&amp;gt; performance was tested on about 800 structures&lt;br /&gt;
(each belonging to a unique SCOP superfamily). This takes about 20&lt;br /&gt;
minutes, or on average 1.5 seconds per structure. Parsing the structure&lt;br /&gt;
of the large ribosomal subunit (1FKK), which contains about 64000&lt;br /&gt;
atoms, takes 10 seconds on a 1000 MHz PC. In short: it's more than&lt;br /&gt;
fast enough for many applications.&lt;br /&gt;
&lt;br /&gt;
== Why should I use Bio.PDB? ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB might be exactly what you want, and then again it might not.&lt;br /&gt;
If you are interested in data mining the PDB header, you might want&lt;br /&gt;
to look elsewhere because there is only limited support for this.&lt;br /&gt;
If you look for a powerful, complete data structure to access the&lt;br /&gt;
atomic data Bio.PDB is probably for you. &lt;br /&gt;
&lt;br /&gt;
== Usage ==&lt;br /&gt;
&lt;br /&gt;
=== General questions ===&lt;br /&gt;
&lt;br /&gt;
==== Importing Bio.PDB ====&lt;br /&gt;
&lt;br /&gt;
That's simple:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
from Bio.PDB import *&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Is there support for molecular graphics? ====&lt;br /&gt;
&lt;br /&gt;
Not directly, mostly since there are quite a few Python based/Python&lt;br /&gt;
aware solutions already, that can potentially be used with Bio.PDB.&lt;br /&gt;
My choice is Pymol, BTW (I've used this successfully with Bio.PDB,&lt;br /&gt;
and there will probably be specific PyMol modules in Bio.PDB soon/some&lt;br /&gt;
day). Python based/aware molecular graphics solutions include:&lt;br /&gt;
&lt;br /&gt;
* PyMol: http://pymol.sourceforge.net/&lt;br /&gt;
* Chimera: http://www.cgl.ucsf.edu/chimera/&lt;br /&gt;
* PMV: http://www.scripps.edu/~sanner/python/&lt;br /&gt;
* Coot: http://www.ysbl.york.ac.uk/~emsley/coot/&lt;br /&gt;
* CCP4mg: http://www.ysbl.york.ac.uk/~lizp/molgraphics.html&lt;br /&gt;
* mmLib: http://pymmlib.sourceforge.net/ &lt;br /&gt;
* VMD: http://www.ks.uiuc.edu/Research/vmd/&lt;br /&gt;
* MMTK: http://starship.python.net/crew/hinsen/MMTK/&lt;br /&gt;
&lt;br /&gt;
I'd be crazy to write another molecular graphics application (been&lt;br /&gt;
there - done that, actually :-).&lt;br /&gt;
&lt;br /&gt;
=== Input/output ===&lt;br /&gt;
&lt;br /&gt;
==== How do I create a structure object from a PDB file? ====&lt;br /&gt;
&lt;br /&gt;
First, create a &amp;lt;code&amp;gt;PDBParser&amp;lt;/code&amp;gt; object:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
parser = PDBParser()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then, create a structure object from a PDB file in the following way&lt;br /&gt;
(the PDB file in this case is called '1FAT.pdb', 'PHA-L' is a user&lt;br /&gt;
defined name for the structure):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
structure = parser.get_structure('PHA-L', '1FAT.pdb')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I create a structure object from an mmCIF file? ====&lt;br /&gt;
&lt;br /&gt;
Similarly to the case the case of PDB files, first create an &amp;lt;code&amp;gt;MMCIFParser&amp;lt;/code&amp;gt;&lt;br /&gt;
object:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
parser = MMCIFParser()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Then use this parser to create a structure object from the mmCIF file:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
structure = parser.get_structure('PHA-L', '1FAT.cif')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== ...and what about the new PDB XML format? ====&lt;br /&gt;
&lt;br /&gt;
That's not yet supported, but I'm definitely planning to support that&lt;br /&gt;
in the future (it's not a lot of work). Contact me if you need this,&lt;br /&gt;
it might encourage me :-).&lt;br /&gt;
&lt;br /&gt;
==== I'd like to have some more low level access to an mmCIF file... ====&lt;br /&gt;
&lt;br /&gt;
You got it. You can create a python dictionary that maps all mmCIF&lt;br /&gt;
tags in an mmCIF file to their values. If there are multiple values&lt;br /&gt;
(like in the case of tag &amp;lt;code&amp;gt;_atom_site.Cartn_y&amp;lt;/code&amp;gt;, which holds&lt;br /&gt;
the ''y'' coordinates of all atoms), the tag is mapped to a list of values.&lt;br /&gt;
The dictionary is created from the mmCIF file as follows:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
mmcif_dict = MMCIF2Dict('1FAT.cif')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example: get the solvent content from an mmCIF file:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
sc = mmcif_dict['_exptl_crystal.density_percent_sol']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example: get the list of the ''y'' coordinates of all atoms&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
y_list = mmcif_dict['_atom_site.Cartn_y']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Can I access the header information? ====&lt;br /&gt;
&lt;br /&gt;
Thanks to Christian Rother you can access some information from the&lt;br /&gt;
PDB header. Note however that many PDB files contain headers with&lt;br /&gt;
incomplete or erroneous information. Many of the errors have been&lt;br /&gt;
fixed in the equivalent mmCIF files.&lt;br /&gt;
'''Hence, if you are interested in the header information, it is a good idea to extract information from mmCIF files using the &amp;lt;code&amp;gt;MMCIF2Dict&amp;lt;/code&amp;gt; tool described above, instead of parsing the PDB header.''' &lt;br /&gt;
&lt;br /&gt;
Now that is clarified, let's return to parsing the PDB header. The&lt;br /&gt;
structure object has an attribute called &amp;lt;code&amp;gt;header&amp;lt;/code&amp;gt; which is&lt;br /&gt;
a python dictionary that maps header records to their values.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
resolution = structure.header['resolution']&lt;br /&gt;
keywords = structure.header['keywords']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The available keys are &amp;lt;code&amp;gt;name&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;head&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;deposition_date&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;release_date&amp;lt;/code&amp;gt;,&lt;br /&gt;
&amp;lt;code&amp;gt;structure_method&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;resolution&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;structure_reference&amp;lt;/code&amp;gt; (maps to&lt;br /&gt;
a list of references), &amp;lt;code&amp;gt;journal_reference&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;author&amp;lt;/code&amp;gt; and&lt;br /&gt;
&amp;lt;code&amp;gt;compound&amp;lt;/code&amp;gt; (maps to a dictionary with various information about&lt;br /&gt;
the crystallized compound).&lt;br /&gt;
&lt;br /&gt;
The dictionary can also be created without creating a &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt;&lt;br /&gt;
object, ie. directly from the PDB file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
handle = open(filename,'r')&lt;br /&gt;
header_dict = parse_pdb_header(handle)&lt;br /&gt;
handle.close()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Can I use Bio.PDB with NMR structures (ie. with more than one model)? ====&lt;br /&gt;
&lt;br /&gt;
Sure. Many PDB parsers assume that there is only one model, making&lt;br /&gt;
them all but useless for NMR structures. The design of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt;&lt;br /&gt;
object makes it easy to handle PDB files with more than one model&lt;br /&gt;
(see section [[#The Structure object]]). &lt;br /&gt;
&lt;br /&gt;
==== How do I download structures from the PDB? ====&lt;br /&gt;
&lt;br /&gt;
This can be done using the &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object, using the &amp;lt;code&amp;gt;retrieve_pdb_file&amp;lt;/code&amp;gt;&lt;br /&gt;
method. The argument for this method is the PDB identifier of the&lt;br /&gt;
structure.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
pdbl = PDBList()&lt;br /&gt;
pdbl.retrieve_pdb_file('1FAT')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; class can also be used as a command-line tool: &lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
python PDBList.py 1fat&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
The downloaded file will be called &amp;lt;code&amp;gt;pdb1fat.ent&amp;lt;/code&amp;gt; and stored&lt;br /&gt;
in the current working directory. Note that the &amp;lt;code&amp;gt;retrieve_pdb_file&amp;lt;/code&amp;gt;&lt;br /&gt;
method also has an optional argument &amp;lt;code&amp;gt;pdir&amp;lt;/code&amp;gt; that specifies&lt;br /&gt;
a specific directory in which to store the downloaded PDB files. &lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;retrieve_pdb_file&amp;lt;/code&amp;gt; method also has some options for specifying&lt;br /&gt;
the compression format used for the download, and the program used&lt;br /&gt;
for local decompression (default &amp;lt;code&amp;gt;.Z&amp;lt;/code&amp;gt; format and &amp;lt;code&amp;gt;gunzip&amp;lt;/code&amp;gt;).&lt;br /&gt;
In addition, the PDB ftp site can be specified upon creation of the&lt;br /&gt;
&amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object. By default, the server of the Worldwide Protein Data Bank (ftp://ftp.wwpdb.org/pub/pdb/data/structures/divided/pdb/)&lt;br /&gt;
is used. See the API documentation for more details. Thanks again&lt;br /&gt;
to Kristian Rother for donating this module.&lt;br /&gt;
&lt;br /&gt;
==== How do I download the entire PDB? ====&lt;br /&gt;
&lt;br /&gt;
The following commands will store all PDB files in the &amp;lt;code&amp;gt;/data/pdb&amp;lt;/code&amp;gt;&lt;br /&gt;
directory:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
python PDBList.py all /data/pdb&lt;br /&gt;
python PDBList.py all /data/pdb -d&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The API method for this is called &amp;lt;code&amp;gt;download_entire_pdb&amp;lt;/code&amp;gt;.&lt;br /&gt;
Adding the &amp;lt;code&amp;gt;-d&amp;lt;/code&amp;gt; option will store all files in the same directory.&lt;br /&gt;
Otherwise, they are sorted into PDB-style subdirectories according&lt;br /&gt;
to their PDB ID's. Depending on the traffic, a complete download will&lt;br /&gt;
take 2-4 days.&lt;br /&gt;
&lt;br /&gt;
==== How do I keep a local copy of the PDB up-to-date? ====&lt;br /&gt;
&lt;br /&gt;
This can also be done using the &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object. One simply&lt;br /&gt;
creates a &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object (specifying the directory where&lt;br /&gt;
the local copy of the PDB is present) and calls the &amp;lt;code&amp;gt;update_pdb&amp;lt;/code&amp;gt;&lt;br /&gt;
method:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
pl = PDBList(pdb='/data/pdb')&lt;br /&gt;
pl.update_pdb()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
One can of course make a weekly cronjob out of this to keep&lt;br /&gt;
the local copy automatically up-to-date. The PDB ftp site can also&lt;br /&gt;
be specified (see API documentation).&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; has some additional methods that can be of use. The&lt;br /&gt;
&amp;lt;code&amp;gt;get_all_obsolete&amp;lt;/code&amp;gt; method can be used to get a list of all&lt;br /&gt;
obsolete PDB entries. The &amp;lt;code&amp;gt;changed_this_week&amp;lt;/code&amp;gt; method can&lt;br /&gt;
be used to obtain the entries that were added, modified or obsoleted&lt;br /&gt;
during the current week. For more info on the possibilities of &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt;,&lt;br /&gt;
see the API documentation.&lt;br /&gt;
&lt;br /&gt;
==== What about all those buggy PDB files? ====&lt;br /&gt;
&lt;br /&gt;
It is well known that many PDB files contain semantic errors (I'm&lt;br /&gt;
not talking about the structures themselves know, but their representation&lt;br /&gt;
in PDB files). Bio.PDB tries to handle this in two ways. The PDBParser&lt;br /&gt;
object can behave in two ways: a restrictive way and a permissive&lt;br /&gt;
way (THIS IS NOW THE DEFAULT). The restrictive way used to be the&lt;br /&gt;
default, but people seemed to think that Bio.PDB 'crashed' due to&lt;br /&gt;
a bug (hah!), so I changed it. If you ever encounter a real bug, please&lt;br /&gt;
tell me immediately!&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Permissive parser&lt;br /&gt;
parser = PDBParser(PERMISSIVE=1)&lt;br /&gt;
parser = PDBParser()  # The same (default)&lt;br /&gt;
# Strict parser&lt;br /&gt;
strict_parser = PDBParser(PERMISSIVE=0)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
In the permissive state (DEFAULT), PDB files that obviously contain&lt;br /&gt;
errors are 'corrected' (i.e. some residues or atoms are left out).&lt;br /&gt;
These errors include:&lt;br /&gt;
* Multiple residues with the same identifier&lt;br /&gt;
* Multiple atoms with the same identifier (taking into account the altloc identifier)&lt;br /&gt;
These errors indicate real problems in the PDB file (for details see&lt;br /&gt;
the Bioinformatics article). In the restrictive state, PDB files with&lt;br /&gt;
errors cause an exception to occur. This is useful to find errors&lt;br /&gt;
in PDB files.&lt;br /&gt;
&lt;br /&gt;
Some errors however are automatically corrected. Normally each disordered&lt;br /&gt;
atom should have a non-blank altloc identifier. However, there are&lt;br /&gt;
many structures that do not follow this convention, and have a blank&lt;br /&gt;
and a non-blank identifier for two disordered positions of the same&lt;br /&gt;
atom. This is automatically interpreted in the right way.&lt;br /&gt;
&lt;br /&gt;
Sometimes a structure contains a list of residues belonging to chain&lt;br /&gt;
A, followed by residues belonging to chain B, and again followed by&lt;br /&gt;
residues belonging to chain A, i.e. the chains are 'broken'. This&lt;br /&gt;
is also correctly interpreted.&lt;br /&gt;
&lt;br /&gt;
==== Can I write PDB files? ====&lt;br /&gt;
&lt;br /&gt;
Use the PDBIO class for this. It's easy to write out specific parts&lt;br /&gt;
of a structure too, of course.&lt;br /&gt;
&lt;br /&gt;
Example: saving a structure&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
io = PDBIO()&lt;br /&gt;
io.set_structure(s)&lt;br /&gt;
io.save('out.pdb')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
If you want to write out a part of the structure, make use of the&lt;br /&gt;
&amp;lt;code&amp;gt;Select&amp;lt;/code&amp;gt; class (also in &amp;lt;code&amp;gt;PDBIO&amp;lt;/code&amp;gt;). Select has four methods:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
accept_model(model)&lt;br /&gt;
accept_chain(chain)&lt;br /&gt;
accept_residue(residue)&lt;br /&gt;
accept_atom(atom)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
By default, every method returns 1 (which means the model/chain/residue/atom&lt;br /&gt;
is included in the output). By subclassing &amp;lt;code&amp;gt;Select&amp;lt;/code&amp;gt; and returning&lt;br /&gt;
0 when appropriate you can exclude models, chains, etc. from the output.&lt;br /&gt;
Cumbersome maybe, but very powerful. The following code only writes&lt;br /&gt;
out glycine residues:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
class GlySelect(Select):&lt;br /&gt;
    def accept_residue(self, residue):&lt;br /&gt;
        if residue.get_name()=='GLY':&lt;br /&gt;
            return 1&lt;br /&gt;
        else:&lt;br /&gt;
            return 0&lt;br /&gt;
&lt;br /&gt;
io = PDBIO()&lt;br /&gt;
io.set_structure(s)&lt;br /&gt;
io.save('gly_only.pdb', GlySelect())&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
If this is all too complicated for you, the &amp;lt;code&amp;gt;Dice&amp;lt;/code&amp;gt; module contains&lt;br /&gt;
a handy &amp;lt;code&amp;gt;extract&amp;lt;/code&amp;gt; function that writes out all residues in&lt;br /&gt;
a chain between a start and end residue.&lt;br /&gt;
&lt;br /&gt;
==== Can I write mmCIF files? ====&lt;br /&gt;
&lt;br /&gt;
No, and I also don't have plans to add that functionality soon (or&lt;br /&gt;
ever - I don't need it at all, and it's a lot of work, plus no-one&lt;br /&gt;
has ever asked for it). People who want to add this can contact me.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== The Structure object ===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==== What's the overall layout of a Structure object? ====&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object follows the so-called '''SMCRA'''&lt;br /&gt;
(Structure/Model/Chain/Residue/Atom) architecture : &lt;br /&gt;
* A structure consists of models&lt;br /&gt;
* A model consists of chains&lt;br /&gt;
* A chain consists of residues&lt;br /&gt;
* A residue consists of atoms&lt;br /&gt;
This is the way many structural biologists/bioinformaticians think&lt;br /&gt;
about structure, and provides a simple but efficient way to deal with&lt;br /&gt;
structure. Additional stuff is essentially added when needed. A UML&lt;br /&gt;
diagram of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object (forgetting about the &amp;lt;code&amp;gt;Disordered&amp;lt;/code&amp;gt;&lt;br /&gt;
classes for now) is shown in the figure below.&lt;br /&gt;
&lt;br /&gt;
[[Image:Smcra.png|600px|left|frame|Diagram of SMCRA architecture of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object. Full lines with diamonds denote aggregation, full lines with arrows denote referencing, full lines with triangles denote inheritance and dashed lines with triangles denote interface realization.]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br clear=all&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I navigate through a Structure object? ====&lt;br /&gt;
&lt;br /&gt;
The following code iterates through all atoms of a structure:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
p = PDBParser()&lt;br /&gt;
structure = p.get_structure('X', 'pdb1fat.ent')&lt;br /&gt;
for model in structure:&lt;br /&gt;
    for chain in model:&lt;br /&gt;
        for residue in chain:&lt;br /&gt;
            for atom in residue:&lt;br /&gt;
                print atom&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
There are also some shortcuts:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Iterate over all atoms in a structure&lt;br /&gt;
for atom in structure.get_atoms():&lt;br /&gt;
    print atom&lt;br /&gt;
&lt;br /&gt;
# Iterate over all residues in a model&lt;br /&gt;
for residue in model.get_residues():&lt;br /&gt;
    print residue&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Structures, models, chains, residues and atoms are called '''Entities'''&lt;br /&gt;
in Biopython. You can always get a parent &amp;lt;code&amp;gt;Entity&amp;lt;/code&amp;gt; from a child&lt;br /&gt;
&amp;lt;code&amp;gt;Entity&amp;lt;/code&amp;gt;, e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
residue = atom.get_parent()&lt;br /&gt;
chain = residue.get_parent()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
You can also test whether an &amp;lt;code&amp;gt;Entity&amp;lt;/code&amp;gt; has a certain child using&lt;br /&gt;
the &amp;lt;code&amp;gt;has_id&amp;lt;/code&amp;gt; method.&lt;br /&gt;
&lt;br /&gt;
==== Can I do that a bit more conveniently? ====&lt;br /&gt;
&lt;br /&gt;
You can do things like:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
atoms = structure.get_atoms()&lt;br /&gt;
residues = structure.get_residues()&lt;br /&gt;
atoms = chain.get_atoms()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
You can also use the &amp;lt;code&amp;gt;Selection.unfold_entities&amp;lt;/code&amp;gt; function:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Get all residues from a structure&lt;br /&gt;
res_list = Selection.unfold_entities(structure, 'R')&lt;br /&gt;
# Get all atoms from a chain&lt;br /&gt;
atom_list = Selection.unfold_entities(chain, 'A')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Obviously, &amp;lt;code&amp;gt;A&amp;lt;/code&amp;gt;=atom, &amp;lt;code&amp;gt;R&amp;lt;/code&amp;gt;=residue, &amp;lt;code&amp;gt;C&amp;lt;/code&amp;gt;=chain, &amp;lt;code&amp;gt;M&amp;lt;/code&amp;gt;=model, &amp;lt;code&amp;gt;S&amp;lt;/code&amp;gt;=structure.&lt;br /&gt;
You can use this to go up in the hierarchy, e.g. to get a list of (unique) &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;Chain&amp;lt;/code&amp;gt; parents from a list of&lt;br /&gt;
&amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;s:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
residue_list = Selection.unfold_entities(atom_list, 'R')&lt;br /&gt;
chain_list = Selection.unfold_entities(atom_list, 'C')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
For more info, see the API documentation.&lt;br /&gt;
&lt;br /&gt;
==== How do I extract a specific &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;Chain&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;Model&amp;lt;/code&amp;gt; from a &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt;? ====&lt;br /&gt;
&lt;br /&gt;
Easy. Here are some examples:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
model = structure[0]&lt;br /&gt;
chain = model['A']&lt;br /&gt;
residue = chain[100]&lt;br /&gt;
atom = residue['CA']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Note that you can use a shortcut:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
atom = structure[0]['A'][100]['CA']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== What is a model id? ====&lt;br /&gt;
&lt;br /&gt;
The model id is an integer which denotes the rank of the model in&lt;br /&gt;
the PDB/mmCIF file. The model is starts at 0. Crystal structures generally&lt;br /&gt;
have only one model (with id 0), while NMR files usually have several&lt;br /&gt;
models.&lt;br /&gt;
&lt;br /&gt;
==== What is a chain id? ====&lt;br /&gt;
&lt;br /&gt;
The chain id is specified in the PDB/mmCIF file, and is a single character&lt;br /&gt;
(typically a letter). &lt;br /&gt;
&lt;br /&gt;
==== What is a residue id? ====&lt;br /&gt;
&lt;br /&gt;
This is a bit more complicated, due to the clumsy PDB format. A residue&lt;br /&gt;
id is a tuple with three elements:&lt;br /&gt;
* The '''hetero-flag''': this is &amp;lt;code&amp;gt;'H_'&amp;lt;/code&amp;gt; plus the name of the hetero-residue (e.g. &amp;lt;code&amp;gt;'H_GLC'&amp;lt;/code&amp;gt; in the case of a glucose molecule), or &amp;lt;code&amp;gt;'W'&amp;lt;/code&amp;gt; in the case of a water molecule.&lt;br /&gt;
* The '''sequence identifier''' in the chain, e.g. 100&lt;br /&gt;
* The '''insertion code''', e.g. &amp;lt;code&amp;gt;'A'&amp;lt;/code&amp;gt;. The insertion code is sometimes used to preserve a certain desirable residue numbering scheme. A Ser 80 insertion mutant (inserted e.g. between a Thr 80 and an Asn 81 residue) could e.g. have sequence identifiers and insertion codes as follows: Thr 80 A, Ser 80 B, Asn 81. In this way the residue numbering scheme stays in tune with that of the wild type structure.&lt;br /&gt;
&lt;br /&gt;
The id of the above glucose residue would thus be &amp;lt;code&amp;gt;('H_GLC', 100, 'A')&amp;lt;/code&amp;gt;. If the hetero-flag and insertion code are blank, the sequence&lt;br /&gt;
identifier alone can be used:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Full id&lt;br /&gt;
residue = chain[(' ', 100, ' ')]&lt;br /&gt;
# Shortcut id&lt;br /&gt;
residue = chain[100]&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
The reason for the hetero-flag is that many, many PDB files use the same sequence identifier for an amino acid and a hetero-residue or a water, which would create obvious problems if the hetero-flag was not used.&lt;br /&gt;
&lt;br /&gt;
==== What is an atom id? ====&lt;br /&gt;
&lt;br /&gt;
The atom id is simply the atom name (eg. &amp;lt;code&amp;gt;'CA'&amp;lt;/code&amp;gt;). In practice,&lt;br /&gt;
the atom name is created by stripping all spaces from the atom name&lt;br /&gt;
in the PDB file. &lt;br /&gt;
&lt;br /&gt;
However, in PDB files, a space can be part of an atom name. Often,&lt;br /&gt;
calcium atoms are called &amp;lt;code&amp;gt;'CA..'&amp;lt;/code&amp;gt; in order to distinguish them&lt;br /&gt;
from C&amp;amp;alpha; atoms (which are called &amp;lt;code&amp;gt;'.CA.'&amp;lt;/code&amp;gt;). In cases&lt;br /&gt;
were stripping the spaces would create problems (ie. two atoms called&lt;br /&gt;
&amp;lt;code&amp;gt;'CA'&amp;lt;/code&amp;gt; in the same residue) the spaces are kept.&lt;br /&gt;
&lt;br /&gt;
==== How is disorder handled? ====&lt;br /&gt;
&lt;br /&gt;
This is one of the strong points of Bio.PDB. It can handle both disordered&lt;br /&gt;
atoms and point mutations (ie. a Gly and an Ala residue in the same&lt;br /&gt;
position). &lt;br /&gt;
&lt;br /&gt;
Disorder should be dealt with from two points of view: the atom and&lt;br /&gt;
the residue points of view. In general, I have tried to encapsulate&lt;br /&gt;
all the complexity that arises from disorder. If you just want to&lt;br /&gt;
loop over all C&amp;amp;alpha; atoms, you do not care that some residues&lt;br /&gt;
have a disordered side chain. On the other hand it should also be&lt;br /&gt;
possible to represent disorder completely in the data structure. Therefore,&lt;br /&gt;
disordered atoms or residues are stored in special objects that behave&lt;br /&gt;
as if there is no disorder. This is done by only representing a subset&lt;br /&gt;
of the disordered atoms or residues. Which subset is picked (e.g.&lt;br /&gt;
which of the two disordered OG side chain atom positions of a Ser&lt;br /&gt;
residue is used) can be specified by the user.&lt;br /&gt;
&lt;br /&gt;
'''Disordered atom positions''' are represented by ordinary &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;&lt;br /&gt;
objects, but all &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects that represent the same physical&lt;br /&gt;
atom are stored in a &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; object (see section [[#The Structure object]]).&lt;br /&gt;
Each &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object in a &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; object can&lt;br /&gt;
be uniquely indexed using its altloc specifier. The &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt;&lt;br /&gt;
object forwards all uncaught method calls to the selected Atom object,&lt;br /&gt;
by default the one that represents the atom with the highest&lt;br /&gt;
occupancy. The user can of course change the selected &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;&lt;br /&gt;
object, making use of its altloc specifier. In this way atom disorder&lt;br /&gt;
is represented correctly without much additional complexity. In other&lt;br /&gt;
words, if you are not interested in atom disorder, you will not be&lt;br /&gt;
bothered by it.&lt;br /&gt;
&lt;br /&gt;
Each disordered atom has a characteristic altloc identifier. You can&lt;br /&gt;
specify that a &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; object should behave like&lt;br /&gt;
the &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object associated with a specific altloc identifier:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
atom.disordered_select('A')  # select altloc A atom&lt;br /&gt;
atom.disordered_select('B')  # select altloc B atom &lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
A special case arises when disorder is due to '''point mutations''',&lt;br /&gt;
i.e. when two or more point mutants of a polypeptide are present in&lt;br /&gt;
the crystal. An example of this can be found in PDB structure 1EN2.&lt;br /&gt;
&lt;br /&gt;
Since these residues belong to a different residue type (e.g. let's&lt;br /&gt;
say Ser 60 and Cys 60) they should not be stored in a single &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt;&lt;br /&gt;
object as in the common case. In this case, each residue is represented&lt;br /&gt;
by one &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object, and both &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; objects&lt;br /&gt;
are stored in a single &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt; object (see section [[#The Structure object]]).&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt; object forwards all uncaught&lt;br /&gt;
methods to the selected &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object (by default the last&lt;br /&gt;
&amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object added), and thus behaves like an ordinary&lt;br /&gt;
residue. Each &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object in a &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt;&lt;br /&gt;
object can be uniquely identified by its residue name. In the above&lt;br /&gt;
example, residue Ser 60 would have id 'SER' in the &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt;&lt;br /&gt;
object, while residue Cys 60 would have id 'CYS'. The user can select&lt;br /&gt;
the active &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object in a &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt;&lt;br /&gt;
object via this id.&lt;br /&gt;
&lt;br /&gt;
Example: suppose that a chain has a point mutation at position 10,&lt;br /&gt;
consisting of a Ser and a Cys residue. Make sure that residue 10 of&lt;br /&gt;
this chain behaves as the Cys residue.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
residue = chain[10]&lt;br /&gt;
residue.disordered_select('CYS')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
In addition, you can get a list of all &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects (ie.&lt;br /&gt;
all &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; objects are 'unpacked' to their individual&lt;br /&gt;
&amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects) using the &amp;lt;code&amp;gt;get_unpacked_list&amp;lt;/code&amp;gt; method&lt;br /&gt;
of a (&amp;lt;code&amp;gt;Disordered&amp;lt;/code&amp;gt;)&amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object.&lt;br /&gt;
&lt;br /&gt;
==== Can I sort residues in a chain somehow? ====&lt;br /&gt;
&lt;br /&gt;
Yes, kinda, but I'm waiting for a request for this feature to finish&lt;br /&gt;
it :-).&lt;br /&gt;
&lt;br /&gt;
==== How are ligands and solvent handled? ====&lt;br /&gt;
&lt;br /&gt;
See 'What is a residue id?'.&lt;br /&gt;
&lt;br /&gt;
==== What about B factors? ====&lt;br /&gt;
&lt;br /&gt;
Well, yes! Bio.PDB supports isotropic and anisotropic B factors, and&lt;br /&gt;
also deals with standard deviations of anisotropic B factor if present&lt;br /&gt;
(see the section [[#Analysis]]).&lt;br /&gt;
&lt;br /&gt;
==== What about standard deviation of atomic positions? ====&lt;br /&gt;
&lt;br /&gt;
Yup, supported. See the section [[#Analysis]].&lt;br /&gt;
&lt;br /&gt;
==== I think the SMCRA data structure is not flexible/sexy/whatever enough... ====&lt;br /&gt;
&lt;br /&gt;
Sure, sure. Everybody is always coming up with (mostly vaporware or&lt;br /&gt;
partly implemented) data structures that handle all possible situations&lt;br /&gt;
and are extensible in all thinkable (and unthinkable) ways. The prosaic&lt;br /&gt;
truth however is that 99.9\% of people using (and I mean really using!)&lt;br /&gt;
crystal structures think in terms of models, chains, residues and&lt;br /&gt;
atoms. The philosophy of Bio.PDB is to provide a reasonably fast,&lt;br /&gt;
clean, simple, but complete data structure to access structure data.&lt;br /&gt;
The proof of the pudding is in the eating.&lt;br /&gt;
&lt;br /&gt;
Moreover, it is quite easy to build more specialised data structures&lt;br /&gt;
on top of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; class (eg. there's a &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt;&lt;br /&gt;
class). On the other hand, the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object is built&lt;br /&gt;
using a Parser/Consumer approach (called &amp;lt;code&amp;gt;PDBParser&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;MMCIFParser&amp;lt;/code&amp;gt;&lt;br /&gt;
and &amp;lt;code&amp;gt;StructureBuilder&amp;lt;/code&amp;gt;, respectively). One can easily reuse&lt;br /&gt;
the PDB/mmCIF parsers by implementing a specialised &amp;lt;code&amp;gt;StructureBuilder&amp;lt;/code&amp;gt;&lt;br /&gt;
class. It is of course also trivial to add support for new file formats&lt;br /&gt;
by writing new parsers.&lt;br /&gt;
&lt;br /&gt;
=== Analysis ===&lt;br /&gt;
&lt;br /&gt;
==== How do I extract information from an &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object? ====&lt;br /&gt;
&lt;br /&gt;
Using the following methods:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
a.get_name()           # atom name (spaces stripped, e.g. 'CA')&lt;br /&gt;
a.get_id()             # id (equals atom name)&lt;br /&gt;
a.get_coord()          # atomic coordinates&lt;br /&gt;
a.get_vector()         # atomic coordinates as Vector object&lt;br /&gt;
a.get_bfactor()        # isotropic B factor&lt;br /&gt;
a.get_occupancy()      # occupancy&lt;br /&gt;
a.get_altloc()         # alternative location specifier&lt;br /&gt;
a.get_sigatm()         # std. dev. of atomic parameters&lt;br /&gt;
a.get_siguij()         # std. dev. of anisotropic B factor&lt;br /&gt;
a.get_anisou()         # anisotropic B factor&lt;br /&gt;
a.get_fullname()       # atom name (with spaces, e.g. '.CA.')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I extract information from a &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object? ====&lt;br /&gt;
&lt;br /&gt;
Using the following methods:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
r.get_resname()         # return the residue name (eg. 'GLY')&lt;br /&gt;
r.is_disordered()       # 1 if the residue has disordered atoms&lt;br /&gt;
r.get_segid()           # return the SEGID&lt;br /&gt;
r.has_id(name)          # test if a residue has a certain atom&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I measure distances? ====&lt;br /&gt;
&lt;br /&gt;
That's simple: the minus operator for atoms has been overloaded to&lt;br /&gt;
return the distance between two atoms. &lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Get some atoms&lt;br /&gt;
ca1 = residue1['CA']&lt;br /&gt;
ca2 = residue2['CA']&lt;br /&gt;
# Simply subtract the atoms to get their distance&lt;br /&gt;
distance = ca1-ca2&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I measure angles? ====&lt;br /&gt;
&lt;br /&gt;
This can easily be done via the vector representation of the atomic coordinates, and the &amp;lt;code&amp;gt;calc_angle&amp;lt;/code&amp;gt; function from the &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; module:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
vector1 = atom1.get_vector()&lt;br /&gt;
vector2 = atom2.get_vector()&lt;br /&gt;
vector3 = atom3.get_vector()&lt;br /&gt;
angle = calc_angle(vector1, vector2, vector3)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I measure torsion angles? ====&lt;br /&gt;
&lt;br /&gt;
Again, this can easily be done via the vector representation of the&lt;br /&gt;
atomic coordinates, this time using the &amp;lt;code&amp;gt;calc_dihedral&amp;lt;/code&amp;gt; function&lt;br /&gt;
from the &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; module:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
vector1 = atom1.get_vector()&lt;br /&gt;
vector2 = atom2.get_vector()&lt;br /&gt;
vector3 = atom3.get_vector()&lt;br /&gt;
vector4 = atom4.get_vector()&lt;br /&gt;
angle = calc_dihedral(vector1, vector2, vector3, vector4)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I determine atom-atom contacts? ====&lt;br /&gt;
&lt;br /&gt;
Use &amp;lt;code&amp;gt;NeighborSearch&amp;lt;/code&amp;gt;. This uses a KD tree data structure coded&lt;br /&gt;
in C behind the screens, so it's pretty darn fast (see &amp;lt;code&amp;gt;Bio.KDTree&amp;lt;/code&amp;gt;).&lt;br /&gt;
&lt;br /&gt;
==== How do I extract polypeptides from a &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object? ====&lt;br /&gt;
&lt;br /&gt;
Use &amp;lt;code&amp;gt;PolypeptideBuilder&amp;lt;/code&amp;gt;. You can use the resulting &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt;&lt;br /&gt;
object to get the sequence as a &amp;lt;code&amp;gt;Seq&amp;lt;/code&amp;gt; object or to get a list&lt;br /&gt;
of C&amp;amp;alpha; atoms as well. Polypeptides can be built using a C-N&lt;br /&gt;
or a C&amp;amp;alpha;-C&amp;amp;alpha; distance criterion.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Using C-N &lt;br /&gt;
ppb=PPBuilder()&lt;br /&gt;
for pp in ppb.build_peptides(structure): &lt;br /&gt;
    print pp.get_sequence()&lt;br /&gt;
&lt;br /&gt;
# Using CA-CA&lt;br /&gt;
ppb=CaPPBuilder()&lt;br /&gt;
for pp in ppb.build_peptides(structure): &lt;br /&gt;
    print pp.get_sequence()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that in the above case only model 0 of the structure is considered&lt;br /&gt;
by &amp;lt;code&amp;gt;PolypeptideBuilder&amp;lt;/code&amp;gt;. However, it is possible to use &amp;lt;code&amp;gt;PolypeptideBuilder&amp;lt;/code&amp;gt;&lt;br /&gt;
to build &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt; objects from &amp;lt;code&amp;gt;Model&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;Chain&amp;lt;/code&amp;gt;&lt;br /&gt;
objects as well.&lt;br /&gt;
&lt;br /&gt;
==== How do I get the sequence of a structure? ====&lt;br /&gt;
&lt;br /&gt;
The first thing to do is to extract all polypeptides from the structure&lt;br /&gt;
(see previous entry). The sequence of each polypeptide can then easily&lt;br /&gt;
be obtained from the &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt; objects. The sequence is&lt;br /&gt;
represented as a Biopython &amp;lt;code&amp;gt;Seq&amp;lt;/code&amp;gt; object, and its alphabet is&lt;br /&gt;
defined by a &amp;lt;code&amp;gt;ProteinAlphabet&amp;lt;/code&amp;gt; object.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; seq = polypeptide.get_sequence()&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print seq&lt;br /&gt;
Seq('SNVVE...', &amp;lt;class Bio.Alphabet.ProteinAlphabet&amp;gt;)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I determine secondary structure? ====&lt;br /&gt;
&lt;br /&gt;
For this functionality, you need to install DSSP (and obtain a license&lt;br /&gt;
for it - free for academic use, see http://www.cmbi.kun.nl/gv/dssp/).&lt;br /&gt;
Then use the &amp;lt;code&amp;gt;DSSP&amp;lt;/code&amp;gt; class, which maps &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; objects&lt;br /&gt;
to their secondary structure (and accessible surface area). The DSSP&lt;br /&gt;
codes are listed in Table \ref{cap:DSSP-codes. Note that DSSP (the&lt;br /&gt;
program, and thus by consequence the class) cannot handle multiple&lt;br /&gt;
models!&lt;br /&gt;
&lt;br /&gt;
%&lt;br /&gt;
\begin{table&lt;br /&gt;
&lt;br /&gt;
==== \begin{tabular{|c|c|&lt;br /&gt;
\hline &lt;br /&gt;
Code&amp;amp;&lt;br /&gt;
Secondary structure\tabularnewline&lt;br /&gt;
\hline&lt;br /&gt;
\hline &lt;br /&gt;
H&amp;amp;&lt;br /&gt;
&amp;amp;alpha;-helix\tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
B&amp;amp;&lt;br /&gt;
Isolated &amp;amp;beta;-bridge residue\tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
E&amp;amp;&lt;br /&gt;
Strand \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
G&amp;amp;&lt;br /&gt;
3-10 helix \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
I&amp;amp;&lt;br /&gt;
$\Pi$-helix \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
T&amp;amp;&lt;br /&gt;
Turn\tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
S&amp;amp;&lt;br /&gt;
Bend \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
-&amp;amp;&lt;br /&gt;
Other\tabularnewline&lt;br /&gt;
\hline&lt;br /&gt;
\end{tabular&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
\caption{\label{cap:DSSP-codesDSSP codes in Bio.PDB.&lt;br /&gt;
\end{table&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==== How do I calculate the accessible surface area of a residue? ====&lt;br /&gt;
&lt;br /&gt;
Use the &amp;lt;code&amp;gt;DSSP&amp;lt;/code&amp;gt; class (see also previous entry). But see also&lt;br /&gt;
next entry.&lt;br /&gt;
&lt;br /&gt;
==== How do I calculate residue depth? ====&lt;br /&gt;
&lt;br /&gt;
Residue depth is the average distance of a residue's atoms from the&lt;br /&gt;
solvent accessible surface. It's a fairly new and very powerful parameterization&lt;br /&gt;
of solvent accessibility. For this functionality, you need to install&lt;br /&gt;
Michel Sanner's MSMS program (http://www.scripps.edu/pub/olson-web/people/sanner/html/msms_home.html).&lt;br /&gt;
Then use the &amp;lt;code&amp;gt;ResidueDepth&amp;lt;/code&amp;gt; class. This class behaves as a&lt;br /&gt;
dictionary which maps &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; objects to corresponding (residue&lt;br /&gt;
depth, C&amp;amp;alpha; depth) tuples. The C&amp;amp;alpha; depth is the distance&lt;br /&gt;
of a residue's C&amp;amp;alpha; atom to the solvent accessible surface. &lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
model = structure[0]&lt;br /&gt;
rd = ResidueDepth(model, pdb_file)&lt;br /&gt;
residue_depth, ca_depth = rd[some_residue]&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
You can also get access to the molecular surface itself (via the &amp;lt;code&amp;gt;get_surface&amp;lt;/code&amp;gt;&lt;br /&gt;
function), in the form of a Numeric python array with the surface points.&lt;br /&gt;
&lt;br /&gt;
==== How do I calculate Half Sphere Exposure? ====&lt;br /&gt;
&lt;br /&gt;
Half Sphere Exposure (HSE) is a new, 2D measure of solvent exposure.&lt;br /&gt;
Basically, it counts the number of C&amp;amp;alpha; atoms around a residue&lt;br /&gt;
in the direction of its side chain, and in the opposite direction&lt;br /&gt;
(within a radius of 13 Å). Despite its simplicity, it outperforms&lt;br /&gt;
many other measures of solvent exposure. An article describing this&lt;br /&gt;
novel 2D measure has been submitted.&lt;br /&gt;
&lt;br /&gt;
HSE comes in two flavors: HSE&amp;amp;alpha; and HSE&amp;amp;beta;. The former&lt;br /&gt;
only uses the C&amp;amp;alpha; atom positions, while the latter uses the&lt;br /&gt;
C&amp;amp;alpha; and C&amp;amp;beta; atom positions. The HSE measure is calculated&lt;br /&gt;
by the &amp;lt;code&amp;gt;HSExposure&amp;lt;/code&amp;gt; class, which can also calculate the contact&lt;br /&gt;
number. The latter class has methods which return dictionaries that&lt;br /&gt;
map a &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object to its corresponding HSE&amp;amp;alpha;, HSE&amp;amp;beta;&lt;br /&gt;
and contact number values.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
model = structure[0]&lt;br /&gt;
hse = HSExposure()&lt;br /&gt;
# Calculate HSEalpha&lt;br /&gt;
exp_ca = hse.calc_hs_exposure(model, option='CA3')&lt;br /&gt;
# Calculate HSEbeta&lt;br /&gt;
exp_cb = hse.calc_hs_exposure(model, option='CB')&lt;br /&gt;
# Calculate classical coordination number&lt;br /&gt;
exp_fs = hse.calc_fs_exposure(model)&lt;br /&gt;
# Print HSEalpha for a residue&lt;br /&gt;
print exp_ca[some_residue]&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I map the residues of two related structures onto each other? ====&lt;br /&gt;
&lt;br /&gt;
First, create an alignment file in FASTA format, then use the &amp;lt;code&amp;gt;StructureAlignment&amp;lt;/code&amp;gt;&lt;br /&gt;
class. This class can also be used for alignments with more than two&lt;br /&gt;
structures.&lt;br /&gt;
&lt;br /&gt;
==== How do I test if a Residue object is an amino acid? ====&lt;br /&gt;
&lt;br /&gt;
Use &amp;lt;code&amp;gt;is_aa(residue)&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
==== Can I do vector operations on atomic coordinates? ====&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects return a &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; object representation&lt;br /&gt;
of the coordinates with the &amp;lt;code&amp;gt;get_vector&amp;lt;/code&amp;gt; method. &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt;&lt;br /&gt;
implements the full set of 3D vector operations, matrix multiplication&lt;br /&gt;
(left and right) and some advanced rotation-related operations as&lt;br /&gt;
well. See also next question.&lt;br /&gt;
&lt;br /&gt;
==== How do I put a virtual C&amp;amp;beta; on a Gly residue? ====&lt;br /&gt;
&lt;br /&gt;
OK, I admit, this example is only present to show off the possibilities&lt;br /&gt;
of Bio.PDB's &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; module (though this code is actually&lt;br /&gt;
used in the &amp;lt;code&amp;gt;HSExposure&amp;lt;/code&amp;gt; module, which contains a novel way&lt;br /&gt;
to parametrize residue exposure - publication underway). Suppose that&lt;br /&gt;
you would like to find the position of a Gly residue's C&amp;amp;beta; atom,&lt;br /&gt;
if it had one. How would you do that? Well, rotating the N atom of&lt;br /&gt;
the Gly residue along the C&amp;amp;alpha;-C bond over -120 degrees roughly&lt;br /&gt;
puts it in the position of a virtual C&amp;amp;beta; atom. Here's how to&lt;br /&gt;
do it, making use of the &amp;lt;code&amp;gt;rotaxis&amp;lt;/code&amp;gt; method (which can be used&lt;br /&gt;
to construct a rotation around a certain axis) of the &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt;&lt;br /&gt;
module:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# get atom coordinates as vectors&lt;br /&gt;
n = residue['N'].get_vector() &lt;br /&gt;
c = residue['C'].get_vector() &lt;br /&gt;
ca = residue['CA'].get_vector()&lt;br /&gt;
# center at origin&lt;br /&gt;
n = n - ca&lt;br /&gt;
c = c - ca&lt;br /&gt;
# find rotation matrix that rotates n -120 degrees along the ca-c vector&lt;br /&gt;
rot = rotaxis(-pi{*120.0/180.0, c)&lt;br /&gt;
# apply rotation to ca-n vector&lt;br /&gt;
cb_at_origin = n.left_multiply(rot)&lt;br /&gt;
# put on top of ca atom&lt;br /&gt;
cb = cb_at_origin + ca&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
This example shows that it's possible to do some quite nontrivial&lt;br /&gt;
vector operations on atomic data, which can be quite useful. In addition&lt;br /&gt;
to all the usual vector operations (cross (use &amp;lt;code&amp;gt;**&amp;lt;/code&amp;gt;), and&lt;br /&gt;
dot (use &amp;lt;code&amp;gt;*&amp;lt;/code&amp;gt;) product, angle, norm, etc.) and the above mentioned&lt;br /&gt;
&amp;lt;code&amp;gt;rotaxis function, the &amp;lt;code&amp;gt;Vector module also has methods&lt;br /&gt;
to rotate (&amp;lt;code&amp;gt;rotmat&amp;lt;/code&amp;gt;) or reflect (&amp;lt;code&amp;gt;refmat&amp;lt;/code&amp;gt;) one vector&lt;br /&gt;
on top of another.&lt;br /&gt;
&lt;br /&gt;
=== Manipulating the structure ===&lt;br /&gt;
&lt;br /&gt;
==== How do I superimpose two structures? ====&lt;br /&gt;
&lt;br /&gt;
Surprisingly, this is done using the &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; object.&lt;br /&gt;
This object calculates the rotation and translation matrix that rotates&lt;br /&gt;
two lists of atoms on top of each other in such a way that their RMSD&lt;br /&gt;
is minimized. Of course, the two lists need to contain the same amount&lt;br /&gt;
of atoms. The &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; object can also apply the rotation/translation&lt;br /&gt;
to a list of atoms. The rotation and translation are stored as a tuple&lt;br /&gt;
in the &amp;lt;code&amp;gt;rotran&amp;lt;/code&amp;gt; attribute of the &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; object&lt;br /&gt;
(note that the rotation is right multiplying!). The RMSD is stored&lt;br /&gt;
in the &amp;lt;code&amp;gt;rmsd&amp;lt;/code&amp;gt; attribute.&lt;br /&gt;
&lt;br /&gt;
The algorithm used by &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; comes from \textit{Matrix&lt;br /&gt;
computations, 2nd ed. Golub, G. \&amp;amp; Van Loan (1989) and makes use&lt;br /&gt;
of singular value decomposition (this is implemented in the general&lt;br /&gt;
&amp;lt;code&amp;gt;Bio.SVDSuperimposer&amp;lt;/code&amp;gt; module).&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
sup = Superimposer()&lt;br /&gt;
# Specify the atom lists&lt;br /&gt;
# 'fixed' and 'moving' are lists of Atom objects&lt;br /&gt;
# The moving atoms will be put on the fixed atoms&lt;br /&gt;
sup.set_atoms(fixed, moving)&lt;br /&gt;
# Print rotation/translation/rmsd&lt;br /&gt;
print sup.rotran&lt;br /&gt;
print sup.rms &lt;br /&gt;
# Apply rotation/translation to the moving atoms&lt;br /&gt;
sup.apply(moving)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I superimpose two structures based on their active sites? ====&lt;br /&gt;
&lt;br /&gt;
Pretty easily. Use the active site atoms to calculate the rotation/translation&lt;br /&gt;
matrices (see above), and apply these to the whole molecule.&lt;br /&gt;
&lt;br /&gt;
==== Can I manipulate the atomic coordinates? ====&lt;br /&gt;
&lt;br /&gt;
Yes, using the &amp;lt;code&amp;gt;transform&amp;lt;/code&amp;gt; method of the &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object,&lt;br /&gt;
or directly using the &amp;lt;code&amp;gt;set_coord&amp;lt;/code&amp;gt; method.&lt;br /&gt;
&lt;br /&gt;
== Other Structural Bioinformatics modules ==&lt;br /&gt;
&lt;br /&gt;
==== Bio.SCOP ====&lt;br /&gt;
&lt;br /&gt;
See the main Biopython tutorial.&lt;br /&gt;
&lt;br /&gt;
==== Bio.FSSP ====&lt;br /&gt;
&lt;br /&gt;
No documentation available yet.&lt;br /&gt;
&lt;br /&gt;
== You haven't answered my question yet! ==&lt;br /&gt;
&lt;br /&gt;
Woah! It's late and I'm tired, and a glass of excellent ''Pedro Ximenez'' sherry is waiting for me. Just drop me a mail, and I'll answer&lt;br /&gt;
you in the morning (with a bit of luck...).&lt;br /&gt;
&lt;br /&gt;
== Contributors ==&lt;br /&gt;
&lt;br /&gt;
The main author/maintainer of Bio.PDB is:&lt;br /&gt;
&lt;br /&gt;
 Thomas Hamelryck&lt;br /&gt;
 Bioinformatics center&lt;br /&gt;
 Institute of Molecular Biology&lt;br /&gt;
 University of Copenhagen&lt;br /&gt;
 Universitetsparken 15, Bygning 10&lt;br /&gt;
 DK-2100 København Ø&lt;br /&gt;
 Denmark&lt;br /&gt;
 thamelry@binf.ku.dk&lt;br /&gt;
&lt;br /&gt;
Kristian Rother donated code to interact with the PDB database, and to parse the PDB&lt;br /&gt;
header. Indraneel Majumdar sent in some bug reports and assisted in&lt;br /&gt;
coding the &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt; module. Many thanks to Brad Chapman,&lt;br /&gt;
Jeffrey Chang, Andrew Dalke and Iddo Friedberg for suggestions, comments,&lt;br /&gt;
help and/or biting criticism :-).&lt;br /&gt;
&lt;br /&gt;
== Can I contribute? ==&lt;br /&gt;
&lt;br /&gt;
Yes, yes, yes! Just send an e-mail to [mailto:thamelry@binf.ku.dk Thomas Hamelryck] or to [mailto:biopython-dev@biopython.org the Biopython developers] if you have something useful to contribute! Eternal fame awaits!&lt;/div&gt;</summary>
		<author><name>Mdehoon</name></author>	</entry>

	<entry>
		<id>http://www.biopython.org/wiki/The_Biopython_Structural_Bioinformatics_FAQ</id>
		<title>The Biopython Structural Bioinformatics FAQ</title>
		<link rel="alternate" type="text/html" href="http://www.biopython.org/wiki/The_Biopython_Structural_Bioinformatics_FAQ"/>
				<updated>2013-03-26T08:40:26Z</updated>
		
		<summary type="html">&lt;p&gt;Mdehoon: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Introduction ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB is a biopython module that focuses on working with crystal&lt;br /&gt;
structures of biological macromolecules. This document gives a fairly&lt;br /&gt;
complete overview of Bio.PDB.&lt;br /&gt;
&lt;br /&gt;
== Bio.PDB's installation ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB is automatically installed as part of Biopython. Biopython&lt;br /&gt;
can be obtained from http://www.biopython.org. It runs on many&lt;br /&gt;
platforms (Linux/Unix, windows, Mac,...).&lt;br /&gt;
&lt;br /&gt;
== Who's using Bio.PDB? ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB was used in the construction of DISEMBL, a web server that&lt;br /&gt;
predicts disordered regions in proteins (http://dis.embl.de/),&lt;br /&gt;
and COLUMBA, a website that provides annotated protein structures&lt;br /&gt;
(http://www.columba-db.de/). Bio.PDB has also been used to&lt;br /&gt;
perform a large scale search for active sites similarities between&lt;br /&gt;
protein structures in the PDB (see [http://dx.doi.org/10.1002/prot.10338 ''Proteins'' '''51''': 96&amp;amp;ndash;108 (2003)]),&lt;br /&gt;
and to develop a new algorithm that identifies linear secondary structure elements (see [http://www.biomedcentral.com/1471-2105/6/202 ''BMC Bioinformatics'' '''6''': 202 (2005)]).&lt;br /&gt;
&lt;br /&gt;
Judging from requests for features and information, Bio.PDB is also&lt;br /&gt;
used by several LPCs (Large Pharmaceutical Companies :-).&lt;br /&gt;
&lt;br /&gt;
== Is there a Bio.PDB reference? ==&lt;br /&gt;
&lt;br /&gt;
Yes, and I'd appreciate it if you would refer to Bio.PDB in publications&lt;br /&gt;
if you make use of it. The reference is:&lt;br /&gt;
&lt;br /&gt;
\begin{quote&lt;br /&gt;
Hamelryck, T., Manderick, B. (2003) PDB parser and structure class&lt;br /&gt;
implemented in Python. \textit{Bioinformatics, \textbf{19, 2308-2310. &lt;br /&gt;
\end{quote&lt;br /&gt;
The article can be freely downloaded via the Bioinformatics journal&lt;br /&gt;
website (http://www.binf.ku.dk/users/thamelry/references.html).&lt;br /&gt;
I welcome e-mails telling me what you are using Bio.PDB for. Feature&lt;br /&gt;
requests are welcome too.&lt;br /&gt;
&lt;br /&gt;
== How well tested is Bio.PDB? ==&lt;br /&gt;
&lt;br /&gt;
Pretty well, actually. Bio.PDB has been extensively tested on nearly&lt;br /&gt;
5500 structures from the PDB - all structures seemed to be parsed&lt;br /&gt;
correctly. More details can be found in the Bio.PDB Bioinformatics&lt;br /&gt;
article. Bio.PDB has been used/is being used in many research projects&lt;br /&gt;
as a reliable tool. In fact, I'm using Bio.PDB almost daily for research&lt;br /&gt;
purposes and continue working on improving it and adding new features.&lt;br /&gt;
&lt;br /&gt;
== How fast is it? ==&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;PDBParser&amp;lt;/code&amp;gt; performance was tested on about 800 structures&lt;br /&gt;
(each belonging to a unique SCOP superfamily). This takes about 20&lt;br /&gt;
minutes, or on average 1.5 seconds per structure. Parsing the structure&lt;br /&gt;
of the large ribosomal subunit (1FKK), which contains about 64000&lt;br /&gt;
atoms, takes 10 seconds on a 1000 MHz PC. In short: it's more than&lt;br /&gt;
fast enough for many applications.&lt;br /&gt;
&lt;br /&gt;
== Why should I use Bio.PDB? ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB might be exactly what you want, and then again it might not.&lt;br /&gt;
If you are interested in data mining the PDB header, you might want&lt;br /&gt;
to look elsewhere because there is only limited support for this.&lt;br /&gt;
If you look for a powerful, complete data structure to access the&lt;br /&gt;
atomic data Bio.PDB is probably for you. &lt;br /&gt;
&lt;br /&gt;
== Usage ==&lt;br /&gt;
&lt;br /&gt;
=== General questions ===&lt;br /&gt;
&lt;br /&gt;
==== Importing Bio.PDB ====&lt;br /&gt;
&lt;br /&gt;
That's simple:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
from Bio.PDB import *&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Is there support for molecular graphics? ====&lt;br /&gt;
&lt;br /&gt;
Not directly, mostly since there are quite a few Python based/Python&lt;br /&gt;
aware solutions already, that can potentially be used with Bio.PDB.&lt;br /&gt;
My choice is Pymol, BTW (I've used this successfully with Bio.PDB,&lt;br /&gt;
and there will probably be specific PyMol modules in Bio.PDB soon/some&lt;br /&gt;
day). Python based/aware molecular graphics solutions include:&lt;br /&gt;
&lt;br /&gt;
* PyMol: http://pymol.sourceforge.net/&lt;br /&gt;
* Chimera: http://www.cgl.ucsf.edu/chimera/&lt;br /&gt;
* PMV: http://www.scripps.edu/~sanner/python/&lt;br /&gt;
* Coot: http://www.ysbl.york.ac.uk/~emsley/coot/&lt;br /&gt;
* CCP4mg: http://www.ysbl.york.ac.uk/~lizp/molgraphics.html&lt;br /&gt;
* mmLib: http://pymmlib.sourceforge.net/ &lt;br /&gt;
* VMD: http://www.ks.uiuc.edu/Research/vmd/&lt;br /&gt;
* MMTK: http://starship.python.net/crew/hinsen/MMTK/&lt;br /&gt;
&lt;br /&gt;
I'd be crazy to write another molecular graphics application (been&lt;br /&gt;
there - done that, actually :-).&lt;br /&gt;
&lt;br /&gt;
=== Input/output ===&lt;br /&gt;
&lt;br /&gt;
==== How do I create a structure object from a PDB file? ====&lt;br /&gt;
&lt;br /&gt;
First, create a &amp;lt;code&amp;gt;PDBParser&amp;lt;/code&amp;gt; object:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
parser = PDBParser()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then, create a structure object from a PDB file in the following way&lt;br /&gt;
(the PDB file in this case is called '1FAT.pdb', 'PHA-L' is a user&lt;br /&gt;
defined name for the structure):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
structure = parser.get_structure('PHA-L', '1FAT.pdb')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I create a structure object from an mmCIF file? ====&lt;br /&gt;
&lt;br /&gt;
Similarly to the case the case of PDB files, first create an &amp;lt;code&amp;gt;MMCIFParser&amp;lt;/code&amp;gt;&lt;br /&gt;
object:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
parser = MMCIFParser()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Then use this parser to create a structure object from the mmCIF file:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
structure = parser.get_structure('PHA-L', '1FAT.cif')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== ...and what about the new PDB XML format? ====&lt;br /&gt;
&lt;br /&gt;
That's not yet supported, but I'm definitely planning to support that&lt;br /&gt;
in the future (it's not a lot of work). Contact me if you need this,&lt;br /&gt;
it might encourage me :-).&lt;br /&gt;
&lt;br /&gt;
==== I'd like to have some more low level access to an mmCIF file... ====&lt;br /&gt;
&lt;br /&gt;
You got it. You can create a python dictionary that maps all mmCIF&lt;br /&gt;
tags in an mmCIF file to their values. If there are multiple values&lt;br /&gt;
(like in the case of tag &amp;lt;code&amp;gt;_atom_site.Cartn_y&amp;lt;/code&amp;gt;, which holds&lt;br /&gt;
the ''y'' coordinates of all atoms), the tag is mapped to a list of values.&lt;br /&gt;
The dictionary is created from the mmCIF file as follows:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
mmcif_dict = MMCIF2Dict('1FAT.cif')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example: get the solvent content from an mmCIF file:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
sc = mmcif_dict['_exptl_crystal.density_percent_sol']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example: get the list of the ''y'' coordinates of all atoms&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
y_list = mmcif_dict['_atom_site.Cartn_y']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Can I access the header information? ====&lt;br /&gt;
&lt;br /&gt;
Thanks to Christian Rother you can access some information from the&lt;br /&gt;
PDB header. Note however that many PDB files contain headers with&lt;br /&gt;
incomplete or erroneous information. Many of the errors have been&lt;br /&gt;
fixed in the equivalent mmCIF files.&lt;br /&gt;
'''Hence, if you are interested in the header information, it is a good idea to extract information from mmCIF files using the &amp;lt;code&amp;gt;MMCIF2Dict&amp;lt;/code&amp;gt; tool described above, instead of parsing the PDB header.''' &lt;br /&gt;
&lt;br /&gt;
Now that is clarified, let's return to parsing the PDB header. The&lt;br /&gt;
structure object has an attribute called &amp;lt;code&amp;gt;header&amp;lt;/code&amp;gt; which is&lt;br /&gt;
a python dictionary that maps header records to their values.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
resolution = structure.header['resolution']&lt;br /&gt;
keywords = structure.header['keywords']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The available keys are &amp;lt;code&amp;gt;name&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;head&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;deposition_date&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;release_date&amp;lt;/code&amp;gt;,&lt;br /&gt;
&amp;lt;code&amp;gt;structure_method&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;resolution&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;structure_reference&amp;lt;/code&amp;gt; (maps to&lt;br /&gt;
a list of references), &amp;lt;code&amp;gt;journal_reference&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;author&amp;lt;/code&amp;gt; and&lt;br /&gt;
&amp;lt;code&amp;gt;compound&amp;lt;/code&amp;gt; (maps to a dictionary with various information about&lt;br /&gt;
the crystallized compound).&lt;br /&gt;
&lt;br /&gt;
The dictionary can also be created without creating a &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt;&lt;br /&gt;
object, ie. directly from the PDB file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
handle = open(filename,'r')&lt;br /&gt;
header_dict = parse_pdb_header(handle)&lt;br /&gt;
handle.close()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Can I use Bio.PDB with NMR structures (ie. with more than one model)? ====&lt;br /&gt;
&lt;br /&gt;
Sure. Many PDB parsers assume that there is only one model, making&lt;br /&gt;
them all but useless for NMR structures. The design of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt;&lt;br /&gt;
object makes it easy to handle PDB files with more than one model&lt;br /&gt;
(see section [[#The Structure object]]). &lt;br /&gt;
&lt;br /&gt;
==== How do I download structures from the PDB? ====&lt;br /&gt;
&lt;br /&gt;
This can be done using the &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object, using the &amp;lt;code&amp;gt;retrieve_pdb_file&amp;lt;/code&amp;gt;&lt;br /&gt;
method. The argument for this method is the PDB identifier of the&lt;br /&gt;
structure.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
pdbl = PDBList()&lt;br /&gt;
pdbl.retrieve_pdb_file('1FAT')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; class can also be used as a command-line tool: &lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
python PDBList.py 1fat&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
The downloaded file will be called &amp;lt;code&amp;gt;pdb1fat.ent&amp;lt;/code&amp;gt; and stored&lt;br /&gt;
in the current working directory. Note that the &amp;lt;code&amp;gt;retrieve_pdb_file&amp;lt;/code&amp;gt;&lt;br /&gt;
method also has an optional argument &amp;lt;code&amp;gt;pdir&amp;lt;/code&amp;gt; that specifies&lt;br /&gt;
a specific directory in which to store the downloaded PDB files. &lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;retrieve_pdb_file&amp;lt;/code&amp;gt; method also has some options for specifying&lt;br /&gt;
the compression format used for the download, and the program used&lt;br /&gt;
for local decompression (default &amp;lt;code&amp;gt;.Z&amp;lt;/code&amp;gt; format and &amp;lt;code&amp;gt;gunzip&amp;lt;/code&amp;gt;).&lt;br /&gt;
In addition, the PDB ftp site can be specified upon creation of the&lt;br /&gt;
&amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object. By default, the server of the Worldwide Protein Data Bank (ftp://ftp.wwpdb.org/pub/pdb/data/structures/divided/pdb/)&lt;br /&gt;
is used. See the API documentation for more details. Thanks again&lt;br /&gt;
to Kristian Rother for donating this module.&lt;br /&gt;
&lt;br /&gt;
==== How do I download the entire PDB? ====&lt;br /&gt;
&lt;br /&gt;
The following commands will store all PDB files in the &amp;lt;code&amp;gt;/data/pdb&amp;lt;/code&amp;gt;&lt;br /&gt;
directory: &lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
python PDBList.py all /data/pdb&lt;br /&gt;
python PDBList.py all /data/pdb -d&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
The API method for this is called &amp;lt;code&amp;gt;download_entire_pdb&amp;lt;/code&amp;gt;.&lt;br /&gt;
Adding the &amp;lt;code&amp;gt;-d&amp;lt;/code&amp;gt; option will store all files in the same directory.&lt;br /&gt;
Otherwise, they are sorted into PDB-style subdirectories according&lt;br /&gt;
to their PDB ID's. Depending on the traffic, a complete download will&lt;br /&gt;
take 2-4 days.&lt;br /&gt;
&lt;br /&gt;
==== How do I keep a local copy of the PDB up-to-date? ====&lt;br /&gt;
&lt;br /&gt;
This can also be done using the &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object. One simply&lt;br /&gt;
creates a &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object (specifying the directory where&lt;br /&gt;
the local copy of the PDB is present) and calls the &amp;lt;code&amp;gt;update_pdb&amp;lt;/code&amp;gt;&lt;br /&gt;
method:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
pl = PDBList(pdb='/data/pdb')&lt;br /&gt;
pl.update_pdb()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
One can of course make a weekly cronjob out of this to keep&lt;br /&gt;
the local copy automatically up-to-date. The PDB ftp site can also&lt;br /&gt;
be specified (see API documentation).&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; has some additional methods that can be of use. The&lt;br /&gt;
&amp;lt;code&amp;gt;get_all_obsolete&amp;lt;/code&amp;gt; method can be used to get a list of all&lt;br /&gt;
obsolete PDB entries. The &amp;lt;code&amp;gt;changed_this_week&amp;lt;/code&amp;gt; method can&lt;br /&gt;
be used to obtain the entries that were added, modified or obsoleted&lt;br /&gt;
during the current week. For more info on the possibilities of &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt;,&lt;br /&gt;
see the API documentation.&lt;br /&gt;
&lt;br /&gt;
==== What about all those buggy PDB files? ====&lt;br /&gt;
&lt;br /&gt;
It is well known that many PDB files contain semantic errors (I'm&lt;br /&gt;
not talking about the structures themselves know, but their representation&lt;br /&gt;
in PDB files). Bio.PDB tries to handle this in two ways. The PDBParser&lt;br /&gt;
object can behave in two ways: a restrictive way and a permissive&lt;br /&gt;
way (THIS IS NOW THE DEFAULT). The restrictive way used to be the&lt;br /&gt;
default, but people seemed to think that Bio.PDB 'crashed' due to&lt;br /&gt;
a bug (hah!), so I changed it. If you ever encounter a real bug, please&lt;br /&gt;
tell me immediately!&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Permissive parser&lt;br /&gt;
parser = PDBParser(PERMISSIVE=1)&lt;br /&gt;
parser = PDBParser()  # The same (default)&lt;br /&gt;
# Strict parser&lt;br /&gt;
strict_parser = PDBParser(PERMISSIVE=0)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
In the permissive state (DEFAULT), PDB files that obviously contain&lt;br /&gt;
errors are 'corrected' (i.e. some residues or atoms are left out).&lt;br /&gt;
These errors include:&lt;br /&gt;
* Multiple residues with the same identifier&lt;br /&gt;
* Multiple atoms with the same identifier (taking into account the altloc identifier)&lt;br /&gt;
These errors indicate real problems in the PDB file (for details see&lt;br /&gt;
the Bioinformatics article). In the restrictive state, PDB files with&lt;br /&gt;
errors cause an exception to occur. This is useful to find errors&lt;br /&gt;
in PDB files.&lt;br /&gt;
&lt;br /&gt;
Some errors however are automatically corrected. Normally each disordered&lt;br /&gt;
atom should have a non-blank altloc identifier. However, there are&lt;br /&gt;
many structures that do not follow this convention, and have a blank&lt;br /&gt;
and a non-blank identifier for two disordered positions of the same&lt;br /&gt;
atom. This is automatically interpreted in the right way.&lt;br /&gt;
&lt;br /&gt;
Sometimes a structure contains a list of residues belonging to chain&lt;br /&gt;
A, followed by residues belonging to chain B, and again followed by&lt;br /&gt;
residues belonging to chain A, i.e. the chains are 'broken'. This&lt;br /&gt;
is also correctly interpreted.&lt;br /&gt;
&lt;br /&gt;
==== Can I write PDB files? ====&lt;br /&gt;
&lt;br /&gt;
Use the PDBIO class for this. It's easy to write out specific parts&lt;br /&gt;
of a structure too, of course.&lt;br /&gt;
&lt;br /&gt;
Example: saving a structure&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
io = PDBIO()&lt;br /&gt;
io.set_structure(s)&lt;br /&gt;
io.save('out.pdb')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
If you want to write out a part of the structure, make use of the&lt;br /&gt;
&amp;lt;code&amp;gt;Select&amp;lt;/code&amp;gt; class (also in &amp;lt;code&amp;gt;PDBIO&amp;lt;/code&amp;gt;). Select has four methods:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
accept_model(model)&lt;br /&gt;
accept_chain(chain)&lt;br /&gt;
accept_residue(residue)&lt;br /&gt;
accept_atom(atom)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
By default, every method returns 1 (which means the model/chain/residue/atom&lt;br /&gt;
is included in the output). By subclassing &amp;lt;code&amp;gt;Select&amp;lt;/code&amp;gt; and returning&lt;br /&gt;
0 when appropriate you can exclude models, chains, etc. from the output.&lt;br /&gt;
Cumbersome maybe, but very powerful. The following code only writes&lt;br /&gt;
out glycine residues:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
class GlySelect(Select):&lt;br /&gt;
    def accept_residue(self, residue):&lt;br /&gt;
        if residue.get_name()=='GLY':&lt;br /&gt;
            return 1&lt;br /&gt;
        else:&lt;br /&gt;
            return 0&lt;br /&gt;
&lt;br /&gt;
io = PDBIO()&lt;br /&gt;
io.set_structure(s)&lt;br /&gt;
io.save('gly_only.pdb', GlySelect())&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
If this is all too complicated for you, the &amp;lt;code&amp;gt;Dice&amp;lt;/code&amp;gt; module contains&lt;br /&gt;
a handy &amp;lt;code&amp;gt;extract&amp;lt;/code&amp;gt; function that writes out all residues in&lt;br /&gt;
a chain between a start and end residue.&lt;br /&gt;
&lt;br /&gt;
==== Can I write mmCIF files? ====&lt;br /&gt;
&lt;br /&gt;
No, and I also don't have plans to add that functionality soon (or&lt;br /&gt;
ever - I don't need it at all, and it's a lot of work, plus no-one&lt;br /&gt;
has ever asked for it). People who want to add this can contact me.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== The Structure object ===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==== What's the overall layout of a Structure object? ====&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object follows the so-called '''SMCRA'''&lt;br /&gt;
(Structure/Model/Chain/Residue/Atom) architecture : &lt;br /&gt;
* A structure consists of models&lt;br /&gt;
* A model consists of chains&lt;br /&gt;
* A chain consists of residues&lt;br /&gt;
* A residue consists of atoms&lt;br /&gt;
This is the way many structural biologists/bioinformaticians think&lt;br /&gt;
about structure, and provides a simple but efficient way to deal with&lt;br /&gt;
structure. Additional stuff is essentially added when needed. A UML&lt;br /&gt;
diagram of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object (forgetting about the &amp;lt;code&amp;gt;Disordered&amp;lt;/code&amp;gt;&lt;br /&gt;
classes for now) is shown in the figure below.&lt;br /&gt;
&lt;br /&gt;
[[Image:Smcra.png|600px|left|frame|Diagram of SMCRA architecture of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object. Full lines with diamonds denote aggregation, full lines with arrows denote referencing, full lines with triangles denote inheritance and dashed lines with triangles denote interface realization.]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br clear=all&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I navigate through a Structure object? ====&lt;br /&gt;
&lt;br /&gt;
The following code iterates through all atoms of a structure:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
p = PDBParser()&lt;br /&gt;
structure = p.get_structure('X', 'pdb1fat.ent')&lt;br /&gt;
for model in structure:&lt;br /&gt;
    for chain in model:&lt;br /&gt;
        for residue in chain:&lt;br /&gt;
            for atom in residue:&lt;br /&gt;
                print atom&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
There are also some shortcuts:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Iterate over all atoms in a structure&lt;br /&gt;
for atom in structure.get_atoms():&lt;br /&gt;
    print atom&lt;br /&gt;
&lt;br /&gt;
# Iterate over all residues in a model&lt;br /&gt;
for residue in model.get_residues():&lt;br /&gt;
    print residue&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Structures, models, chains, residues and atoms are called '''Entities'''&lt;br /&gt;
in Biopython. You can always get a parent &amp;lt;code&amp;gt;Entity&amp;lt;/code&amp;gt; from a child&lt;br /&gt;
&amp;lt;code&amp;gt;Entity&amp;lt;/code&amp;gt;, e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
residue = atom.get_parent()&lt;br /&gt;
chain = residue.get_parent()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
You can also test whether an &amp;lt;code&amp;gt;Entity&amp;lt;/code&amp;gt; has a certain child using&lt;br /&gt;
the &amp;lt;code&amp;gt;has_id&amp;lt;/code&amp;gt; method.&lt;br /&gt;
&lt;br /&gt;
==== Can I do that a bit more conveniently? ====&lt;br /&gt;
&lt;br /&gt;
You can do things like:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
atoms = structure.get_atoms()&lt;br /&gt;
residues = structure.get_residues()&lt;br /&gt;
atoms = chain.get_atoms()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
You can also use the &amp;lt;code&amp;gt;Selection.unfold_entities&amp;lt;/code&amp;gt; function:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Get all residues from a structure&lt;br /&gt;
res_list = Selection.unfold_entities(structure, 'R')&lt;br /&gt;
# Get all atoms from a chain&lt;br /&gt;
atom_list = Selection.unfold_entities(chain, 'A')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Obviously, &amp;lt;code&amp;gt;A&amp;lt;/code&amp;gt;=atom, &amp;lt;code&amp;gt;R&amp;lt;/code&amp;gt;=residue, &amp;lt;code&amp;gt;C&amp;lt;/code&amp;gt;=chain, &amp;lt;code&amp;gt;M&amp;lt;/code&amp;gt;=model, &amp;lt;code&amp;gt;S&amp;lt;/code&amp;gt;=structure.&lt;br /&gt;
You can use this to go up in the hierarchy, e.g. to get a list of (unique) &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;Chain&amp;lt;/code&amp;gt; parents from a list of&lt;br /&gt;
&amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;s:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
residue_list = Selection.unfold_entities(atom_list, 'R')&lt;br /&gt;
chain_list = Selection.unfold_entities(atom_list, 'C')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
For more info, see the API documentation.&lt;br /&gt;
&lt;br /&gt;
==== How do I extract a specific &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;Chain&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;Model&amp;lt;/code&amp;gt; from a &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt;? ====&lt;br /&gt;
&lt;br /&gt;
Easy. Here are some examples:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
model = structure[0]&lt;br /&gt;
chain = model['A']&lt;br /&gt;
residue = chain[100]&lt;br /&gt;
atom = residue['CA']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Note that you can use a shortcut:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
atom = structure[0]['A'][100]['CA']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== What is a model id? ====&lt;br /&gt;
&lt;br /&gt;
The model id is an integer which denotes the rank of the model in&lt;br /&gt;
the PDB/mmCIF file. The model is starts at 0. Crystal structures generally&lt;br /&gt;
have only one model (with id 0), while NMR files usually have several&lt;br /&gt;
models.&lt;br /&gt;
&lt;br /&gt;
==== What is a chain id? ====&lt;br /&gt;
&lt;br /&gt;
The chain id is specified in the PDB/mmCIF file, and is a single character&lt;br /&gt;
(typically a letter). &lt;br /&gt;
&lt;br /&gt;
==== What is a residue id? ====&lt;br /&gt;
&lt;br /&gt;
This is a bit more complicated, due to the clumsy PDB format. A residue&lt;br /&gt;
id is a tuple with three elements:&lt;br /&gt;
* The '''hetero-flag''': this is &amp;lt;code&amp;gt;'H_'&amp;lt;/code&amp;gt; plus the name of the hetero-residue (e.g. &amp;lt;code&amp;gt;'H_GLC'&amp;lt;/code&amp;gt; in the case of a glucose molecule), or &amp;lt;code&amp;gt;'W'&amp;lt;/code&amp;gt; in the case of a water molecule.&lt;br /&gt;
* The '''sequence identifier''' in the chain, e.g. 100&lt;br /&gt;
* The '''insertion code''', e.g. &amp;lt;code&amp;gt;'A'&amp;lt;/code&amp;gt;. The insertion code is sometimes used to preserve a certain desirable residue numbering scheme. A Ser 80 insertion mutant (inserted e.g. between a Thr 80 and an Asn 81 residue) could e.g. have sequence identifiers and insertion codes as follows: Thr 80 A, Ser 80 B, Asn 81. In this way the residue numbering scheme stays in tune with that of the wild type structure.&lt;br /&gt;
&lt;br /&gt;
The id of the above glucose residue would thus be &amp;lt;code&amp;gt;('H_GLC', 100, 'A')&amp;lt;/code&amp;gt;. If the hetero-flag and insertion code are blank, the sequence&lt;br /&gt;
identifier alone can be used:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Full id&lt;br /&gt;
residue = chain[(' ', 100, ' ')]&lt;br /&gt;
# Shortcut id&lt;br /&gt;
residue = chain[100]&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
The reason for the hetero-flag is that many, many PDB files use the same sequence identifier for an amino acid and a hetero-residue or a water, which would create obvious problems if the hetero-flag was not used.&lt;br /&gt;
&lt;br /&gt;
==== What is an atom id? ====&lt;br /&gt;
&lt;br /&gt;
The atom id is simply the atom name (eg. &amp;lt;code&amp;gt;'CA'&amp;lt;/code&amp;gt;). In practice,&lt;br /&gt;
the atom name is created by stripping all spaces from the atom name&lt;br /&gt;
in the PDB file. &lt;br /&gt;
&lt;br /&gt;
However, in PDB files, a space can be part of an atom name. Often,&lt;br /&gt;
calcium atoms are called &amp;lt;code&amp;gt;'CA..'&amp;lt;/code&amp;gt; in order to distinguish them&lt;br /&gt;
from C&amp;amp;alpha; atoms (which are called &amp;lt;code&amp;gt;'.CA.'&amp;lt;/code&amp;gt;). In cases&lt;br /&gt;
were stripping the spaces would create problems (ie. two atoms called&lt;br /&gt;
&amp;lt;code&amp;gt;'CA'&amp;lt;/code&amp;gt; in the same residue) the spaces are kept.&lt;br /&gt;
&lt;br /&gt;
==== How is disorder handled? ====&lt;br /&gt;
&lt;br /&gt;
This is one of the strong points of Bio.PDB. It can handle both disordered&lt;br /&gt;
atoms and point mutations (ie. a Gly and an Ala residue in the same&lt;br /&gt;
position). &lt;br /&gt;
&lt;br /&gt;
Disorder should be dealt with from two points of view: the atom and&lt;br /&gt;
the residue points of view. In general, I have tried to encapsulate&lt;br /&gt;
all the complexity that arises from disorder. If you just want to&lt;br /&gt;
loop over all C&amp;amp;alpha; atoms, you do not care that some residues&lt;br /&gt;
have a disordered side chain. On the other hand it should also be&lt;br /&gt;
possible to represent disorder completely in the data structure. Therefore,&lt;br /&gt;
disordered atoms or residues are stored in special objects that behave&lt;br /&gt;
as if there is no disorder. This is done by only representing a subset&lt;br /&gt;
of the disordered atoms or residues. Which subset is picked (e.g.&lt;br /&gt;
which of the two disordered OG side chain atom positions of a Ser&lt;br /&gt;
residue is used) can be specified by the user.&lt;br /&gt;
&lt;br /&gt;
'''Disordered atom positions''' are represented by ordinary &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;&lt;br /&gt;
objects, but all &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects that represent the same physical&lt;br /&gt;
atom are stored in a &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; object (see section [[#The Structure object]]).&lt;br /&gt;
Each &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object in a &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; object can&lt;br /&gt;
be uniquely indexed using its altloc specifier. The &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt;&lt;br /&gt;
object forwards all uncaught method calls to the selected Atom object,&lt;br /&gt;
by default the one that represents the atom with the highest&lt;br /&gt;
occupancy. The user can of course change the selected &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;&lt;br /&gt;
object, making use of its altloc specifier. In this way atom disorder&lt;br /&gt;
is represented correctly without much additional complexity. In other&lt;br /&gt;
words, if you are not interested in atom disorder, you will not be&lt;br /&gt;
bothered by it.&lt;br /&gt;
&lt;br /&gt;
Each disordered atom has a characteristic altloc identifier. You can&lt;br /&gt;
specify that a &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; object should behave like&lt;br /&gt;
the &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object associated with a specific altloc identifier:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
atom.disordered_select('A')  # select altloc A atom&lt;br /&gt;
atom.disordered_select('B')  # select altloc B atom &lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
A special case arises when disorder is due to '''point mutations''',&lt;br /&gt;
i.e. when two or more point mutants of a polypeptide are present in&lt;br /&gt;
the crystal. An example of this can be found in PDB structure 1EN2.&lt;br /&gt;
&lt;br /&gt;
Since these residues belong to a different residue type (e.g. let's&lt;br /&gt;
say Ser 60 and Cys 60) they should not be stored in a single &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt;&lt;br /&gt;
object as in the common case. In this case, each residue is represented&lt;br /&gt;
by one &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object, and both &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; objects&lt;br /&gt;
are stored in a single &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt; object (see section [[#The Structure object]]).&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt; object forwards all uncaught&lt;br /&gt;
methods to the selected &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object (by default the last&lt;br /&gt;
&amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object added), and thus behaves like an ordinary&lt;br /&gt;
residue. Each &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object in a &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt;&lt;br /&gt;
object can be uniquely identified by its residue name. In the above&lt;br /&gt;
example, residue Ser 60 would have id 'SER' in the &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt;&lt;br /&gt;
object, while residue Cys 60 would have id 'CYS'. The user can select&lt;br /&gt;
the active &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object in a &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt;&lt;br /&gt;
object via this id.&lt;br /&gt;
&lt;br /&gt;
Example: suppose that a chain has a point mutation at position 10,&lt;br /&gt;
consisting of a Ser and a Cys residue. Make sure that residue 10 of&lt;br /&gt;
this chain behaves as the Cys residue.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
residue = chain[10]&lt;br /&gt;
residue.disordered_select('CYS')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
In addition, you can get a list of all &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects (ie.&lt;br /&gt;
all &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; objects are 'unpacked' to their individual&lt;br /&gt;
&amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects) using the &amp;lt;code&amp;gt;get_unpacked_list&amp;lt;/code&amp;gt; method&lt;br /&gt;
of a (&amp;lt;code&amp;gt;Disordered&amp;lt;/code&amp;gt;)&amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object.&lt;br /&gt;
&lt;br /&gt;
==== Can I sort residues in a chain somehow? ====&lt;br /&gt;
&lt;br /&gt;
Yes, kinda, but I'm waiting for a request for this feature to finish&lt;br /&gt;
it :-).&lt;br /&gt;
&lt;br /&gt;
==== How are ligands and solvent handled? ====&lt;br /&gt;
&lt;br /&gt;
See 'What is a residue id?'.&lt;br /&gt;
&lt;br /&gt;
==== What about B factors? ====&lt;br /&gt;
&lt;br /&gt;
Well, yes! Bio.PDB supports isotropic and anisotropic B factors, and&lt;br /&gt;
also deals with standard deviations of anisotropic B factor if present&lt;br /&gt;
(see the section [[#Analysis]]).&lt;br /&gt;
&lt;br /&gt;
==== What about standard deviation of atomic positions? ====&lt;br /&gt;
&lt;br /&gt;
Yup, supported. See the section [[#Analysis]].&lt;br /&gt;
&lt;br /&gt;
==== I think the SMCRA data structure is not flexible/sexy/whatever enough... ====&lt;br /&gt;
&lt;br /&gt;
Sure, sure. Everybody is always coming up with (mostly vaporware or&lt;br /&gt;
partly implemented) data structures that handle all possible situations&lt;br /&gt;
and are extensible in all thinkable (and unthinkable) ways. The prosaic&lt;br /&gt;
truth however is that 99.9\% of people using (and I mean really using!)&lt;br /&gt;
crystal structures think in terms of models, chains, residues and&lt;br /&gt;
atoms. The philosophy of Bio.PDB is to provide a reasonably fast,&lt;br /&gt;
clean, simple, but complete data structure to access structure data.&lt;br /&gt;
The proof of the pudding is in the eating.&lt;br /&gt;
&lt;br /&gt;
Moreover, it is quite easy to build more specialised data structures&lt;br /&gt;
on top of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; class (eg. there's a &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt;&lt;br /&gt;
class). On the other hand, the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object is built&lt;br /&gt;
using a Parser/Consumer approach (called &amp;lt;code&amp;gt;PDBParser&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;MMCIFParser&amp;lt;/code&amp;gt;&lt;br /&gt;
and &amp;lt;code&amp;gt;StructureBuilder&amp;lt;/code&amp;gt;, respectively). One can easily reuse&lt;br /&gt;
the PDB/mmCIF parsers by implementing a specialised &amp;lt;code&amp;gt;StructureBuilder&amp;lt;/code&amp;gt;&lt;br /&gt;
class. It is of course also trivial to add support for new file formats&lt;br /&gt;
by writing new parsers.&lt;br /&gt;
&lt;br /&gt;
=== Analysis ===&lt;br /&gt;
&lt;br /&gt;
==== How do I extract information from an &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object? ====&lt;br /&gt;
&lt;br /&gt;
Using the following methods:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
a.get_name()           # atom name (spaces stripped, e.g. 'CA')&lt;br /&gt;
a.get_id()             # id (equals atom name)&lt;br /&gt;
a.get_coord()          # atomic coordinates&lt;br /&gt;
a.get_vector()         # atomic coordinates as Vector object&lt;br /&gt;
a.get_bfactor()        # isotropic B factor&lt;br /&gt;
a.get_occupancy()      # occupancy&lt;br /&gt;
a.get_altloc()         # alternative location specifier&lt;br /&gt;
a.get_sigatm()         # std. dev. of atomic parameters&lt;br /&gt;
a.get_siguij()         # std. dev. of anisotropic B factor&lt;br /&gt;
a.get_anisou()         # anisotropic B factor&lt;br /&gt;
a.get_fullname()       # atom name (with spaces, e.g. '.CA.')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I extract information from a &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object? ====&lt;br /&gt;
&lt;br /&gt;
Using the following methods:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
r.get_resname()         # return the residue name (eg. 'GLY')&lt;br /&gt;
r.is_disordered()       # 1 if the residue has disordered atoms&lt;br /&gt;
r.get_segid()           # return the SEGID&lt;br /&gt;
r.has_id(name)          # test if a residue has a certain atom&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I measure distances? ====&lt;br /&gt;
&lt;br /&gt;
That's simple: the minus operator for atoms has been overloaded to&lt;br /&gt;
return the distance between two atoms. &lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Get some atoms&lt;br /&gt;
ca1 = residue1['CA']&lt;br /&gt;
ca2 = residue2['CA']&lt;br /&gt;
# Simply subtract the atoms to get their distance&lt;br /&gt;
distance = ca1-ca2&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I measure angles? ====&lt;br /&gt;
&lt;br /&gt;
This can easily be done via the vector representation of the atomic coordinates, and the &amp;lt;code&amp;gt;calc_angle&amp;lt;/code&amp;gt; function from the &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; module:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
vector1 = atom1.get_vector()&lt;br /&gt;
vector2 = atom2.get_vector()&lt;br /&gt;
vector3 = atom3.get_vector()&lt;br /&gt;
angle = calc_angle(vector1, vector2, vector3)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I measure torsion angles? ====&lt;br /&gt;
&lt;br /&gt;
Again, this can easily be done via the vector representation of the&lt;br /&gt;
atomic coordinates, this time using the &amp;lt;code&amp;gt;calc_dihedral&amp;lt;/code&amp;gt; function&lt;br /&gt;
from the &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; module:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
vector1 = atom1.get_vector()&lt;br /&gt;
vector2 = atom2.get_vector()&lt;br /&gt;
vector3 = atom3.get_vector()&lt;br /&gt;
vector4 = atom4.get_vector()&lt;br /&gt;
angle = calc_dihedral(vector1, vector2, vector3, vector4)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I determine atom-atom contacts? ====&lt;br /&gt;
&lt;br /&gt;
Use &amp;lt;code&amp;gt;NeighborSearch&amp;lt;/code&amp;gt;. This uses a KD tree data structure coded&lt;br /&gt;
in C behind the screens, so it's pretty darn fast (see &amp;lt;code&amp;gt;Bio.KDTree&amp;lt;/code&amp;gt;).&lt;br /&gt;
&lt;br /&gt;
==== How do I extract polypeptides from a &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object? ====&lt;br /&gt;
&lt;br /&gt;
Use &amp;lt;code&amp;gt;PolypeptideBuilder&amp;lt;/code&amp;gt;. You can use the resulting &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt;&lt;br /&gt;
object to get the sequence as a &amp;lt;code&amp;gt;Seq&amp;lt;/code&amp;gt; object or to get a list&lt;br /&gt;
of C&amp;amp;alpha; atoms as well. Polypeptides can be built using a C-N&lt;br /&gt;
or a C&amp;amp;alpha;-C&amp;amp;alpha; distance criterion.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Using C-N &lt;br /&gt;
ppb=PPBuilder()&lt;br /&gt;
for pp in ppb.build_peptides(structure): &lt;br /&gt;
    print pp.get_sequence()&lt;br /&gt;
&lt;br /&gt;
# Using CA-CA&lt;br /&gt;
ppb=CaPPBuilder()&lt;br /&gt;
for pp in ppb.build_peptides(structure): &lt;br /&gt;
    print pp.get_sequence()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that in the above case only model 0 of the structure is considered&lt;br /&gt;
by &amp;lt;code&amp;gt;PolypeptideBuilder&amp;lt;/code&amp;gt;. However, it is possible to use &amp;lt;code&amp;gt;PolypeptideBuilder&amp;lt;/code&amp;gt;&lt;br /&gt;
to build &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt; objects from &amp;lt;code&amp;gt;Model&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;Chain&amp;lt;/code&amp;gt;&lt;br /&gt;
objects as well.&lt;br /&gt;
&lt;br /&gt;
==== How do I get the sequence of a structure? ====&lt;br /&gt;
&lt;br /&gt;
The first thing to do is to extract all polypeptides from the structure&lt;br /&gt;
(see previous entry). The sequence of each polypeptide can then easily&lt;br /&gt;
be obtained from the &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt; objects. The sequence is&lt;br /&gt;
represented as a Biopython &amp;lt;code&amp;gt;Seq&amp;lt;/code&amp;gt; object, and its alphabet is&lt;br /&gt;
defined by a &amp;lt;code&amp;gt;ProteinAlphabet&amp;lt;/code&amp;gt; object.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; seq = polypeptide.get_sequence()&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print seq&lt;br /&gt;
Seq('SNVVE...', &amp;lt;class Bio.Alphabet.ProteinAlphabet&amp;gt;)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I determine secondary structure? ====&lt;br /&gt;
&lt;br /&gt;
For this functionality, you need to install DSSP (and obtain a license&lt;br /&gt;
for it - free for academic use, see http://www.cmbi.kun.nl/gv/dssp/).&lt;br /&gt;
Then use the &amp;lt;code&amp;gt;DSSP&amp;lt;/code&amp;gt; class, which maps &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; objects&lt;br /&gt;
to their secondary structure (and accessible surface area). The DSSP&lt;br /&gt;
codes are listed in Table \ref{cap:DSSP-codes. Note that DSSP (the&lt;br /&gt;
program, and thus by consequence the class) cannot handle multiple&lt;br /&gt;
models!&lt;br /&gt;
&lt;br /&gt;
%&lt;br /&gt;
\begin{table&lt;br /&gt;
&lt;br /&gt;
==== \begin{tabular{|c|c|&lt;br /&gt;
\hline &lt;br /&gt;
Code&amp;amp;&lt;br /&gt;
Secondary structure\tabularnewline&lt;br /&gt;
\hline&lt;br /&gt;
\hline &lt;br /&gt;
H&amp;amp;&lt;br /&gt;
&amp;amp;alpha;-helix\tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
B&amp;amp;&lt;br /&gt;
Isolated &amp;amp;beta;-bridge residue\tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
E&amp;amp;&lt;br /&gt;
Strand \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
G&amp;amp;&lt;br /&gt;
3-10 helix \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
I&amp;amp;&lt;br /&gt;
$\Pi$-helix \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
T&amp;amp;&lt;br /&gt;
Turn\tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
S&amp;amp;&lt;br /&gt;
Bend \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
-&amp;amp;&lt;br /&gt;
Other\tabularnewline&lt;br /&gt;
\hline&lt;br /&gt;
\end{tabular&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
\caption{\label{cap:DSSP-codesDSSP codes in Bio.PDB.&lt;br /&gt;
\end{table&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==== How do I calculate the accessible surface area of a residue? ====&lt;br /&gt;
&lt;br /&gt;
Use the &amp;lt;code&amp;gt;DSSP&amp;lt;/code&amp;gt; class (see also previous entry). But see also&lt;br /&gt;
next entry.&lt;br /&gt;
&lt;br /&gt;
==== How do I calculate residue depth? ====&lt;br /&gt;
&lt;br /&gt;
Residue depth is the average distance of a residue's atoms from the&lt;br /&gt;
solvent accessible surface. It's a fairly new and very powerful parameterization&lt;br /&gt;
of solvent accessibility. For this functionality, you need to install&lt;br /&gt;
Michel Sanner's MSMS program (http://www.scripps.edu/pub/olson-web/people/sanner/html/msms_home.html).&lt;br /&gt;
Then use the &amp;lt;code&amp;gt;ResidueDepth&amp;lt;/code&amp;gt; class. This class behaves as a&lt;br /&gt;
dictionary which maps &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; objects to corresponding (residue&lt;br /&gt;
depth, C&amp;amp;alpha; depth) tuples. The C&amp;amp;alpha; depth is the distance&lt;br /&gt;
of a residue's C&amp;amp;alpha; atom to the solvent accessible surface. &lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
model = structure[0]&lt;br /&gt;
rd = ResidueDepth(model, pdb_file)&lt;br /&gt;
residue_depth, ca_depth = rd[some_residue]&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
You can also get access to the molecular surface itself (via the &amp;lt;code&amp;gt;get_surface&amp;lt;/code&amp;gt;&lt;br /&gt;
function), in the form of a Numeric python array with the surface points.&lt;br /&gt;
&lt;br /&gt;
==== How do I calculate Half Sphere Exposure? ====&lt;br /&gt;
&lt;br /&gt;
Half Sphere Exposure (HSE) is a new, 2D measure of solvent exposure.&lt;br /&gt;
Basically, it counts the number of C&amp;amp;alpha; atoms around a residue&lt;br /&gt;
in the direction of its side chain, and in the opposite direction&lt;br /&gt;
(within a radius of 13 Å). Despite its simplicity, it outperforms&lt;br /&gt;
many other measures of solvent exposure. An article describing this&lt;br /&gt;
novel 2D measure has been submitted.&lt;br /&gt;
&lt;br /&gt;
HSE comes in two flavors: HSE&amp;amp;alpha; and HSE&amp;amp;beta;. The former&lt;br /&gt;
only uses the C&amp;amp;alpha; atom positions, while the latter uses the&lt;br /&gt;
C&amp;amp;alpha; and C&amp;amp;beta; atom positions. The HSE measure is calculated&lt;br /&gt;
by the &amp;lt;code&amp;gt;HSExposure&amp;lt;/code&amp;gt; class, which can also calculate the contact&lt;br /&gt;
number. The latter class has methods which return dictionaries that&lt;br /&gt;
map a &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object to its corresponding HSE&amp;amp;alpha;, HSE&amp;amp;beta;&lt;br /&gt;
and contact number values.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
model = structure[0]&lt;br /&gt;
hse = HSExposure()&lt;br /&gt;
# Calculate HSEalpha&lt;br /&gt;
exp_ca = hse.calc_hs_exposure(model, option='CA3')&lt;br /&gt;
# Calculate HSEbeta&lt;br /&gt;
exp_cb = hse.calc_hs_exposure(model, option='CB')&lt;br /&gt;
# Calculate classical coordination number&lt;br /&gt;
exp_fs = hse.calc_fs_exposure(model)&lt;br /&gt;
# Print HSEalpha for a residue&lt;br /&gt;
print exp_ca[some_residue]&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I map the residues of two related structures onto each other? ====&lt;br /&gt;
&lt;br /&gt;
First, create an alignment file in FASTA format, then use the &amp;lt;code&amp;gt;StructureAlignment&amp;lt;/code&amp;gt;&lt;br /&gt;
class. This class can also be used for alignments with more than two&lt;br /&gt;
structures.&lt;br /&gt;
&lt;br /&gt;
==== How do I test if a Residue object is an amino acid? ====&lt;br /&gt;
&lt;br /&gt;
Use &amp;lt;code&amp;gt;is_aa(residue)&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
==== Can I do vector operations on atomic coordinates? ====&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects return a &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; object representation&lt;br /&gt;
of the coordinates with the &amp;lt;code&amp;gt;get_vector&amp;lt;/code&amp;gt; method. &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt;&lt;br /&gt;
implements the full set of 3D vector operations, matrix multiplication&lt;br /&gt;
(left and right) and some advanced rotation-related operations as&lt;br /&gt;
well. See also next question.&lt;br /&gt;
&lt;br /&gt;
==== How do I put a virtual C&amp;amp;beta; on a Gly residue? ====&lt;br /&gt;
&lt;br /&gt;
OK, I admit, this example is only present to show off the possibilities&lt;br /&gt;
of Bio.PDB's &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; module (though this code is actually&lt;br /&gt;
used in the &amp;lt;code&amp;gt;HSExposure&amp;lt;/code&amp;gt; module, which contains a novel way&lt;br /&gt;
to parametrize residue exposure - publication underway). Suppose that&lt;br /&gt;
you would like to find the position of a Gly residue's C&amp;amp;beta; atom,&lt;br /&gt;
if it had one. How would you do that? Well, rotating the N atom of&lt;br /&gt;
the Gly residue along the C&amp;amp;alpha;-C bond over -120 degrees roughly&lt;br /&gt;
puts it in the position of a virtual C&amp;amp;beta; atom. Here's how to&lt;br /&gt;
do it, making use of the &amp;lt;code&amp;gt;rotaxis&amp;lt;/code&amp;gt; method (which can be used&lt;br /&gt;
to construct a rotation around a certain axis) of the &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt;&lt;br /&gt;
module:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# get atom coordinates as vectors&lt;br /&gt;
n = residue['N'].get_vector() &lt;br /&gt;
c = residue['C'].get_vector() &lt;br /&gt;
ca = residue['CA'].get_vector()&lt;br /&gt;
# center at origin&lt;br /&gt;
n = n - ca&lt;br /&gt;
c = c - ca&lt;br /&gt;
# find rotation matrix that rotates n -120 degrees along the ca-c vector&lt;br /&gt;
rot = rotaxis(-pi{*120.0/180.0, c)&lt;br /&gt;
# apply rotation to ca-n vector&lt;br /&gt;
cb_at_origin = n.left_multiply(rot)&lt;br /&gt;
# put on top of ca atom&lt;br /&gt;
cb = cb_at_origin + ca&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
This example shows that it's possible to do some quite nontrivial&lt;br /&gt;
vector operations on atomic data, which can be quite useful. In addition&lt;br /&gt;
to all the usual vector operations (cross (use &amp;lt;code&amp;gt;**&amp;lt;/code&amp;gt;), and&lt;br /&gt;
dot (use &amp;lt;code&amp;gt;*&amp;lt;/code&amp;gt;) product, angle, norm, etc.) and the above mentioned&lt;br /&gt;
&amp;lt;code&amp;gt;rotaxis function, the &amp;lt;code&amp;gt;Vector module also has methods&lt;br /&gt;
to rotate (&amp;lt;code&amp;gt;rotmat&amp;lt;/code&amp;gt;) or reflect (&amp;lt;code&amp;gt;refmat&amp;lt;/code&amp;gt;) one vector&lt;br /&gt;
on top of another.&lt;br /&gt;
&lt;br /&gt;
=== Manipulating the structure ===&lt;br /&gt;
&lt;br /&gt;
==== How do I superimpose two structures? ====&lt;br /&gt;
&lt;br /&gt;
Surprisingly, this is done using the &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; object.&lt;br /&gt;
This object calculates the rotation and translation matrix that rotates&lt;br /&gt;
two lists of atoms on top of each other in such a way that their RMSD&lt;br /&gt;
is minimized. Of course, the two lists need to contain the same amount&lt;br /&gt;
of atoms. The &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; object can also apply the rotation/translation&lt;br /&gt;
to a list of atoms. The rotation and translation are stored as a tuple&lt;br /&gt;
in the &amp;lt;code&amp;gt;rotran&amp;lt;/code&amp;gt; attribute of the &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; object&lt;br /&gt;
(note that the rotation is right multiplying!). The RMSD is stored&lt;br /&gt;
in the &amp;lt;code&amp;gt;rmsd&amp;lt;/code&amp;gt; attribute.&lt;br /&gt;
&lt;br /&gt;
The algorithm used by &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; comes from \textit{Matrix&lt;br /&gt;
computations, 2nd ed. Golub, G. \&amp;amp; Van Loan (1989) and makes use&lt;br /&gt;
of singular value decomposition (this is implemented in the general&lt;br /&gt;
&amp;lt;code&amp;gt;Bio.SVDSuperimposer&amp;lt;/code&amp;gt; module).&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
sup = Superimposer()&lt;br /&gt;
# Specify the atom lists&lt;br /&gt;
# 'fixed' and 'moving' are lists of Atom objects&lt;br /&gt;
# The moving atoms will be put on the fixed atoms&lt;br /&gt;
sup.set_atoms(fixed, moving)&lt;br /&gt;
# Print rotation/translation/rmsd&lt;br /&gt;
print sup.rotran&lt;br /&gt;
print sup.rms &lt;br /&gt;
# Apply rotation/translation to the moving atoms&lt;br /&gt;
sup.apply(moving)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I superimpose two structures based on their active sites? ====&lt;br /&gt;
&lt;br /&gt;
Pretty easily. Use the active site atoms to calculate the rotation/translation&lt;br /&gt;
matrices (see above), and apply these to the whole molecule.&lt;br /&gt;
&lt;br /&gt;
==== Can I manipulate the atomic coordinates? ====&lt;br /&gt;
&lt;br /&gt;
Yes, using the &amp;lt;code&amp;gt;transform&amp;lt;/code&amp;gt; method of the &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object,&lt;br /&gt;
or directly using the &amp;lt;code&amp;gt;set_coord&amp;lt;/code&amp;gt; method.&lt;br /&gt;
&lt;br /&gt;
== Other Structural Bioinformatics modules ==&lt;br /&gt;
&lt;br /&gt;
==== Bio.SCOP ====&lt;br /&gt;
&lt;br /&gt;
See the main Biopython tutorial.&lt;br /&gt;
&lt;br /&gt;
==== Bio.FSSP ====&lt;br /&gt;
&lt;br /&gt;
No documentation available yet.&lt;br /&gt;
&lt;br /&gt;
== You haven't answered my question yet! ==&lt;br /&gt;
&lt;br /&gt;
Woah! It's late and I'm tired, and a glass of excellent ''Pedro Ximenez'' sherry is waiting for me. Just drop me a mail, and I'll answer&lt;br /&gt;
you in the morning (with a bit of luck...).&lt;br /&gt;
&lt;br /&gt;
== Contributors ==&lt;br /&gt;
&lt;br /&gt;
The main author/maintainer of Bio.PDB is:&lt;br /&gt;
&lt;br /&gt;
 Thomas Hamelryck&lt;br /&gt;
 Bioinformatics center&lt;br /&gt;
 Institute of Molecular Biology&lt;br /&gt;
 University of Copenhagen&lt;br /&gt;
 Universitetsparken 15, Bygning 10&lt;br /&gt;
 DK-2100 København Ø&lt;br /&gt;
 Denmark&lt;br /&gt;
 thamelry@binf.ku.dk&lt;br /&gt;
&lt;br /&gt;
Kristian Rother donated code to interact with the PDB database, and to parse the PDB&lt;br /&gt;
header. Indraneel Majumdar sent in some bug reports and assisted in&lt;br /&gt;
coding the &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt; module. Many thanks to Brad Chapman,&lt;br /&gt;
Jeffrey Chang, Andrew Dalke and Iddo Friedberg for suggestions, comments,&lt;br /&gt;
help and/or biting criticism :-).&lt;br /&gt;
&lt;br /&gt;
== Can I contribute? ==&lt;br /&gt;
&lt;br /&gt;
Yes, yes, yes! Just send an e-mail to [mailto:thamelry@binf.ku.dk Thomas Hamelryck] or to [mailto:biopython-dev@biopython.org the Biopython developers] if you have something useful to contribute! Eternal fame awaits!&lt;/div&gt;</summary>
		<author><name>Mdehoon</name></author>	</entry>

	<entry>
		<id>http://www.biopython.org/wiki/The_Biopython_Structural_Bioinformatics_FAQ</id>
		<title>The Biopython Structural Bioinformatics FAQ</title>
		<link rel="alternate" type="text/html" href="http://www.biopython.org/wiki/The_Biopython_Structural_Bioinformatics_FAQ"/>
				<updated>2013-03-26T08:39:54Z</updated>
		
		<summary type="html">&lt;p&gt;Mdehoon: /* Is there support for molecular graphics? */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Introduction ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB is a biopython module that focuses on working with crystal&lt;br /&gt;
structures of biological macromolecules. This document gives a fairly&lt;br /&gt;
complete overview of Bio.PDB.&lt;br /&gt;
&lt;br /&gt;
== Bio.PDB's installation ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB is automatically installed as part of Biopython. Biopython&lt;br /&gt;
can be obtained from http://www.biopython.org. It runs on many&lt;br /&gt;
platforms (Linux/Unix, windows, Mac,...).&lt;br /&gt;
&lt;br /&gt;
== Who's using Bio.PDB? ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB was used in the construction of DISEMBL, a web server that&lt;br /&gt;
predicts disordered regions in proteins (http://dis.embl.de/),&lt;br /&gt;
and COLUMBA, a website that provides annotated protein structures&lt;br /&gt;
(http://www.columba-db.de/). Bio.PDB has also been used to&lt;br /&gt;
perform a large scale search for active sites similarities between&lt;br /&gt;
protein structures in the PDB (see [http://dx.doi.org/10.1002/prot.10338 ''Proteins'' '''51''': 96&amp;amp;ndash;108 (2003)]),&lt;br /&gt;
and to develop a new algorithm that identifies linear secondary structure elements (see [http://www.biomedcentral.com/1471-2105/6/202 ''BMC Bioinformatics'' '''6''': 202 (2005)]).&lt;br /&gt;
&lt;br /&gt;
Judging from requests for features and information, Bio.PDB is also&lt;br /&gt;
used by several LPCs (Large Pharmaceutical Companies :-).&lt;br /&gt;
&lt;br /&gt;
== Is there a Bio.PDB reference? ==&lt;br /&gt;
&lt;br /&gt;
Yes, and I'd appreciate it if you would refer to Bio.PDB in publications&lt;br /&gt;
if you make use of it. The reference is:&lt;br /&gt;
&lt;br /&gt;
\begin{quote&lt;br /&gt;
Hamelryck, T., Manderick, B. (2003) PDB parser and structure class&lt;br /&gt;
implemented in Python. \textit{Bioinformatics, \textbf{19, 2308-2310. &lt;br /&gt;
\end{quote&lt;br /&gt;
The article can be freely downloaded via the Bioinformatics journal&lt;br /&gt;
website (http://www.binf.ku.dk/users/thamelry/references.html).&lt;br /&gt;
I welcome e-mails telling me what you are using Bio.PDB for. Feature&lt;br /&gt;
requests are welcome too.&lt;br /&gt;
&lt;br /&gt;
== How well tested is Bio.PDB? ==&lt;br /&gt;
&lt;br /&gt;
Pretty well, actually. Bio.PDB has been extensively tested on nearly&lt;br /&gt;
5500 structures from the PDB - all structures seemed to be parsed&lt;br /&gt;
correctly. More details can be found in the Bio.PDB Bioinformatics&lt;br /&gt;
article. Bio.PDB has been used/is being used in many research projects&lt;br /&gt;
as a reliable tool. In fact, I'm using Bio.PDB almost daily for research&lt;br /&gt;
purposes and continue working on improving it and adding new features.&lt;br /&gt;
&lt;br /&gt;
== How fast is it? ==&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;PDBParser&amp;lt;/code&amp;gt; performance was tested on about 800 structures&lt;br /&gt;
(each belonging to a unique SCOP superfamily). This takes about 20&lt;br /&gt;
minutes, or on average 1.5 seconds per structure. Parsing the structure&lt;br /&gt;
of the large ribosomal subunit (1FKK), which contains about 64000&lt;br /&gt;
atoms, takes 10 seconds on a 1000 MHz PC. In short: it's more than&lt;br /&gt;
fast enough for many applications.&lt;br /&gt;
&lt;br /&gt;
== Why should I use Bio.PDB? ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB might be exactly what you want, and then again it might not.&lt;br /&gt;
If you are interested in data mining the PDB header, you might want&lt;br /&gt;
to look elsewhere because there is only limited support for this.&lt;br /&gt;
If you look for a powerful, complete data structure to access the&lt;br /&gt;
atomic data Bio.PDB is probably for you. &lt;br /&gt;
&lt;br /&gt;
== Usage ==&lt;br /&gt;
&lt;br /&gt;
=== General questions ===&lt;br /&gt;
&lt;br /&gt;
==== Importing Bio.PDB ====&lt;br /&gt;
&lt;br /&gt;
That's simple:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
from Bio.PDB import *&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Is there support for molecular graphics? ====&lt;br /&gt;
&lt;br /&gt;
Not directly, mostly since there are quite a few Python based/Python&lt;br /&gt;
aware solutions already, that can potentially be used with Bio.PDB.&lt;br /&gt;
My choice is Pymol, BTW (I've used this successfully with Bio.PDB,&lt;br /&gt;
and there will probably be specific PyMol modules in Bio.PDB soon/some&lt;br /&gt;
day). Python based/aware molecular graphics solutions include:&lt;br /&gt;
&lt;br /&gt;
* PyMol: http://pymol.sourceforge.net/&lt;br /&gt;
* Chimera: http://www.cgl.ucsf.edu/chimera/&lt;br /&gt;
* PMV: http://www.scripps.edu/~sanner/python/&lt;br /&gt;
* Coot: http://www.ysbl.york.ac.uk/~emsley/coot/&lt;br /&gt;
* CCP4mg: http://www.ysbl.york.ac.uk/~lizp/molgraphics.html&lt;br /&gt;
* mmLib: http://pymmlib.sourceforge.net/ &lt;br /&gt;
* VMD: http://www.ks.uiuc.edu/Research/vmd/&lt;br /&gt;
* MMTK: http://starship.python.net/crew/hinsen/MMTK/&lt;br /&gt;
&lt;br /&gt;
I'd be crazy to write another molecular graphics application (been&lt;br /&gt;
there - done that, actually :-).&lt;br /&gt;
&lt;br /&gt;
=== Input/output ===&lt;br /&gt;
&lt;br /&gt;
==== How do I create a structure object from a PDB file? ====&lt;br /&gt;
&lt;br /&gt;
First, create a &amp;lt;code&amp;gt;PDBParser&amp;lt;/code&amp;gt; object:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
parser = PDBParser()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then, create a structure object from a PDB file in the following way&lt;br /&gt;
(the PDB file in this case is called '1FAT.pdb', 'PHA-L' is a user&lt;br /&gt;
defined name for the structure):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
structure = parser.get_structure('PHA-L', '1FAT.pdb')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I create a structure object from an mmCIF file? ====&lt;br /&gt;
&lt;br /&gt;
Similarly to the case the case of PDB files, first create an &amp;lt;code&amp;gt;MMCIFParser&amp;lt;/code&amp;gt;&lt;br /&gt;
object:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
parser = MMCIFParser()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Then use this parser to create a structure object from the mmCIF file:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
structure = parser.get_structure('PHA-L', '1FAT.cif')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== ...and what about the new PDB XML format? ====&lt;br /&gt;
&lt;br /&gt;
That's not yet supported, but I'm definitely planning to support that&lt;br /&gt;
in the future (it's not a lot of work). Contact me if you need this,&lt;br /&gt;
it might encourage me :-).&lt;br /&gt;
&lt;br /&gt;
==== I'd like to have some more low level access to an mmCIF file... ====&lt;br /&gt;
&lt;br /&gt;
You got it. You can create a python dictionary that maps all mmCIF&lt;br /&gt;
tags in an mmCIF file to their values. If there are multiple values&lt;br /&gt;
(like in the case of tag &amp;lt;code&amp;gt;_atom_site.Cartn_y&amp;lt;/code&amp;gt;, which holds&lt;br /&gt;
the ''y'' coordinates of all atoms), the tag is mapped to a list of values.&lt;br /&gt;
The dictionary is created from the mmCIF file as follows:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
mmcif_dict = MMCIF2Dict('1FAT.cif')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example: get the solvent content from an mmCIF file:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
sc = mmcif_dict['_exptl_crystal.density_percent_sol']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example: get the list of the ''y'' coordinates of all atoms&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
y_list = mmcif_dict['_atom_site.Cartn_y']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Can I access the header information? ====&lt;br /&gt;
&lt;br /&gt;
Thanks to Christian Rother you can access some information from the&lt;br /&gt;
PDB header. Note however that many PDB files contain headers with&lt;br /&gt;
incomplete or erroneous information. Many of the errors have been&lt;br /&gt;
fixed in the equivalent mmCIF files.&lt;br /&gt;
'''Hence, if you are interested in the header information, it is a good idea to extract information from mmCIF files using the &amp;lt;code&amp;gt;MMCIF2Dict&amp;lt;/code&amp;gt; tool described above, instead of parsing the PDB header.''' &lt;br /&gt;
&lt;br /&gt;
Now that is clarified, let's return to parsing the PDB header. The&lt;br /&gt;
structure object has an attribute called &amp;lt;code&amp;gt;header&amp;lt;/code&amp;gt; which is&lt;br /&gt;
a python dictionary that maps header records to their values.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
resolution = structure.header['resolution']&lt;br /&gt;
keywords = structure.header['keywords']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The available keys are &amp;lt;code&amp;gt;name&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;head&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;deposition_date&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;release_date&amp;lt;/code&amp;gt;,&lt;br /&gt;
&amp;lt;code&amp;gt;structure_method&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;resolution&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;structure_reference&amp;lt;/code&amp;gt; (maps to&lt;br /&gt;
a list of references), &amp;lt;code&amp;gt;journal_reference&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;author&amp;lt;/code&amp;gt; and&lt;br /&gt;
&amp;lt;code&amp;gt;compound&amp;lt;/code&amp;gt; (maps to a dictionary with various information about&lt;br /&gt;
the crystallized compound).&lt;br /&gt;
&lt;br /&gt;
The dictionary can also be created without creating a &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt;&lt;br /&gt;
object, ie. directly from the PDB file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
handle = open(filename,'r')&lt;br /&gt;
header_dict = parse_pdb_header(handle)&lt;br /&gt;
handle.close()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Can I use Bio.PDB with NMR structures (ie. with more than one model)? ====&lt;br /&gt;
&lt;br /&gt;
Sure. Many PDB parsers assume that there is only one model, making&lt;br /&gt;
them all but useless for NMR structures. The design of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt;&lt;br /&gt;
object makes it easy to handle PDB files with more than one model&lt;br /&gt;
(see section [[#The Structure object]]). &lt;br /&gt;
&lt;br /&gt;
==== How do I download structures from the PDB? ====&lt;br /&gt;
&lt;br /&gt;
This can be done using the &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object, using the &amp;lt;code&amp;gt;retrieve_pdb_file&amp;lt;/code&amp;gt;&lt;br /&gt;
method. The argument for this method is the PDB identifier of the&lt;br /&gt;
structure.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
pdbl = PDBList()&lt;br /&gt;
pdbl.retrieve_pdb_file('1FAT')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; class can also be used as a command-line tool: &lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
python PDBList.py 1fat&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
The downloaded file will be called &amp;lt;code&amp;gt;pdb1fat.ent&amp;lt;/code&amp;gt; and stored&lt;br /&gt;
in the current working directory. Note that the &amp;lt;code&amp;gt;retrieve_pdb_file&amp;lt;/code&amp;gt;&lt;br /&gt;
method also has an optional argument &amp;lt;code&amp;gt;pdir&amp;lt;/code&amp;gt; that specifies&lt;br /&gt;
a specific directory in which to store the downloaded PDB files. &lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;retrieve_pdb_file&amp;lt;/code&amp;gt; method also has some options for specifying&lt;br /&gt;
the compression format used for the download, and the program used&lt;br /&gt;
for local decompression (default &amp;lt;code&amp;gt;.Z&amp;lt;/code&amp;gt; format and &amp;lt;code&amp;gt;gunzip&amp;lt;/code&amp;gt;).&lt;br /&gt;
In addition, the PDB ftp site can be specified upon creation of the&lt;br /&gt;
&amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object. By default, the server of the Worldwide Protein Data Bank (\url{ftp://ftp.wwpdb.org/pub/pdb/data/structures/divided/pdb/})&lt;br /&gt;
is used. See the API documentation for more details. Thanks again&lt;br /&gt;
to Kristian Rother for donating this module.&lt;br /&gt;
&lt;br /&gt;
==== How do I download the entire PDB? ====&lt;br /&gt;
&lt;br /&gt;
The following commands will store all PDB files in the &amp;lt;code&amp;gt;/data/pdb&amp;lt;/code&amp;gt;&lt;br /&gt;
directory: &lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
python PDBList.py all /data/pdb&lt;br /&gt;
python PDBList.py all /data/pdb -d&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
The API method for this is called &amp;lt;code&amp;gt;download_entire_pdb&amp;lt;/code&amp;gt;.&lt;br /&gt;
Adding the &amp;lt;code&amp;gt;-d&amp;lt;/code&amp;gt; option will store all files in the same directory.&lt;br /&gt;
Otherwise, they are sorted into PDB-style subdirectories according&lt;br /&gt;
to their PDB ID's. Depending on the traffic, a complete download will&lt;br /&gt;
take 2-4 days.&lt;br /&gt;
&lt;br /&gt;
==== How do I keep a local copy of the PDB up-to-date? ====&lt;br /&gt;
&lt;br /&gt;
This can also be done using the &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object. One simply&lt;br /&gt;
creates a &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object (specifying the directory where&lt;br /&gt;
the local copy of the PDB is present) and calls the &amp;lt;code&amp;gt;update_pdb&amp;lt;/code&amp;gt;&lt;br /&gt;
method:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
pl = PDBList(pdb='/data/pdb')&lt;br /&gt;
pl.update_pdb()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
One can of course make a weekly cronjob out of this to keep&lt;br /&gt;
the local copy automatically up-to-date. The PDB ftp site can also&lt;br /&gt;
be specified (see API documentation).&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; has some additional methods that can be of use. The&lt;br /&gt;
&amp;lt;code&amp;gt;get_all_obsolete&amp;lt;/code&amp;gt; method can be used to get a list of all&lt;br /&gt;
obsolete PDB entries. The &amp;lt;code&amp;gt;changed_this_week&amp;lt;/code&amp;gt; method can&lt;br /&gt;
be used to obtain the entries that were added, modified or obsoleted&lt;br /&gt;
during the current week. For more info on the possibilities of &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt;,&lt;br /&gt;
see the API documentation.&lt;br /&gt;
&lt;br /&gt;
==== What about all those buggy PDB files? ====&lt;br /&gt;
&lt;br /&gt;
It is well known that many PDB files contain semantic errors (I'm&lt;br /&gt;
not talking about the structures themselves know, but their representation&lt;br /&gt;
in PDB files). Bio.PDB tries to handle this in two ways. The PDBParser&lt;br /&gt;
object can behave in two ways: a restrictive way and a permissive&lt;br /&gt;
way (THIS IS NOW THE DEFAULT). The restrictive way used to be the&lt;br /&gt;
default, but people seemed to think that Bio.PDB 'crashed' due to&lt;br /&gt;
a bug (hah!), so I changed it. If you ever encounter a real bug, please&lt;br /&gt;
tell me immediately!&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Permissive parser&lt;br /&gt;
parser = PDBParser(PERMISSIVE=1)&lt;br /&gt;
parser = PDBParser()  # The same (default)&lt;br /&gt;
# Strict parser&lt;br /&gt;
strict_parser = PDBParser(PERMISSIVE=0)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
In the permissive state (DEFAULT), PDB files that obviously contain&lt;br /&gt;
errors are 'corrected' (i.e. some residues or atoms are left out).&lt;br /&gt;
These errors include:&lt;br /&gt;
* Multiple residues with the same identifier&lt;br /&gt;
* Multiple atoms with the same identifier (taking into account the altloc identifier)&lt;br /&gt;
These errors indicate real problems in the PDB file (for details see&lt;br /&gt;
the Bioinformatics article). In the restrictive state, PDB files with&lt;br /&gt;
errors cause an exception to occur. This is useful to find errors&lt;br /&gt;
in PDB files.&lt;br /&gt;
&lt;br /&gt;
Some errors however are automatically corrected. Normally each disordered&lt;br /&gt;
atom should have a non-blank altloc identifier. However, there are&lt;br /&gt;
many structures that do not follow this convention, and have a blank&lt;br /&gt;
and a non-blank identifier for two disordered positions of the same&lt;br /&gt;
atom. This is automatically interpreted in the right way.&lt;br /&gt;
&lt;br /&gt;
Sometimes a structure contains a list of residues belonging to chain&lt;br /&gt;
A, followed by residues belonging to chain B, and again followed by&lt;br /&gt;
residues belonging to chain A, i.e. the chains are 'broken'. This&lt;br /&gt;
is also correctly interpreted.&lt;br /&gt;
&lt;br /&gt;
==== Can I write PDB files? ====&lt;br /&gt;
&lt;br /&gt;
Use the PDBIO class for this. It's easy to write out specific parts&lt;br /&gt;
of a structure too, of course.&lt;br /&gt;
&lt;br /&gt;
Example: saving a structure&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
io = PDBIO()&lt;br /&gt;
io.set_structure(s)&lt;br /&gt;
io.save('out.pdb')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
If you want to write out a part of the structure, make use of the&lt;br /&gt;
&amp;lt;code&amp;gt;Select&amp;lt;/code&amp;gt; class (also in &amp;lt;code&amp;gt;PDBIO&amp;lt;/code&amp;gt;). Select has four methods:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
accept_model(model)&lt;br /&gt;
accept_chain(chain)&lt;br /&gt;
accept_residue(residue)&lt;br /&gt;
accept_atom(atom)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
By default, every method returns 1 (which means the model/chain/residue/atom&lt;br /&gt;
is included in the output). By subclassing &amp;lt;code&amp;gt;Select&amp;lt;/code&amp;gt; and returning&lt;br /&gt;
0 when appropriate you can exclude models, chains, etc. from the output.&lt;br /&gt;
Cumbersome maybe, but very powerful. The following code only writes&lt;br /&gt;
out glycine residues:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
class GlySelect(Select):&lt;br /&gt;
    def accept_residue(self, residue):&lt;br /&gt;
        if residue.get_name()=='GLY':&lt;br /&gt;
            return 1&lt;br /&gt;
        else:&lt;br /&gt;
            return 0&lt;br /&gt;
&lt;br /&gt;
io = PDBIO()&lt;br /&gt;
io.set_structure(s)&lt;br /&gt;
io.save('gly_only.pdb', GlySelect())&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
If this is all too complicated for you, the &amp;lt;code&amp;gt;Dice&amp;lt;/code&amp;gt; module contains&lt;br /&gt;
a handy &amp;lt;code&amp;gt;extract&amp;lt;/code&amp;gt; function that writes out all residues in&lt;br /&gt;
a chain between a start and end residue.&lt;br /&gt;
&lt;br /&gt;
==== Can I write mmCIF files? ====&lt;br /&gt;
&lt;br /&gt;
No, and I also don't have plans to add that functionality soon (or&lt;br /&gt;
ever - I don't need it at all, and it's a lot of work, plus no-one&lt;br /&gt;
has ever asked for it). People who want to add this can contact me.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== The Structure object ===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==== What's the overall layout of a Structure object? ====&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object follows the so-called '''SMCRA'''&lt;br /&gt;
(Structure/Model/Chain/Residue/Atom) architecture : &lt;br /&gt;
* A structure consists of models&lt;br /&gt;
* A model consists of chains&lt;br /&gt;
* A chain consists of residues&lt;br /&gt;
* A residue consists of atoms&lt;br /&gt;
This is the way many structural biologists/bioinformaticians think&lt;br /&gt;
about structure, and provides a simple but efficient way to deal with&lt;br /&gt;
structure. Additional stuff is essentially added when needed. A UML&lt;br /&gt;
diagram of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object (forgetting about the &amp;lt;code&amp;gt;Disordered&amp;lt;/code&amp;gt;&lt;br /&gt;
classes for now) is shown in the figure below.&lt;br /&gt;
&lt;br /&gt;
[[Image:Smcra.png|600px|left|frame|Diagram of SMCRA architecture of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object. Full lines with diamonds denote aggregation, full lines with arrows denote referencing, full lines with triangles denote inheritance and dashed lines with triangles denote interface realization.]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br clear=all&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I navigate through a Structure object? ====&lt;br /&gt;
&lt;br /&gt;
The following code iterates through all atoms of a structure:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
p = PDBParser()&lt;br /&gt;
structure = p.get_structure('X', 'pdb1fat.ent')&lt;br /&gt;
for model in structure:&lt;br /&gt;
    for chain in model:&lt;br /&gt;
        for residue in chain:&lt;br /&gt;
            for atom in residue:&lt;br /&gt;
                print atom&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
There are also some shortcuts:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Iterate over all atoms in a structure&lt;br /&gt;
for atom in structure.get_atoms():&lt;br /&gt;
    print atom&lt;br /&gt;
&lt;br /&gt;
# Iterate over all residues in a model&lt;br /&gt;
for residue in model.get_residues():&lt;br /&gt;
    print residue&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Structures, models, chains, residues and atoms are called '''Entities'''&lt;br /&gt;
in Biopython. You can always get a parent &amp;lt;code&amp;gt;Entity&amp;lt;/code&amp;gt; from a child&lt;br /&gt;
&amp;lt;code&amp;gt;Entity&amp;lt;/code&amp;gt;, e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
residue = atom.get_parent()&lt;br /&gt;
chain = residue.get_parent()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
You can also test whether an &amp;lt;code&amp;gt;Entity&amp;lt;/code&amp;gt; has a certain child using&lt;br /&gt;
the &amp;lt;code&amp;gt;has_id&amp;lt;/code&amp;gt; method.&lt;br /&gt;
&lt;br /&gt;
==== Can I do that a bit more conveniently? ====&lt;br /&gt;
&lt;br /&gt;
You can do things like:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
atoms = structure.get_atoms()&lt;br /&gt;
residues = structure.get_residues()&lt;br /&gt;
atoms = chain.get_atoms()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
You can also use the &amp;lt;code&amp;gt;Selection.unfold_entities&amp;lt;/code&amp;gt; function:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Get all residues from a structure&lt;br /&gt;
res_list = Selection.unfold_entities(structure, 'R')&lt;br /&gt;
# Get all atoms from a chain&lt;br /&gt;
atom_list = Selection.unfold_entities(chain, 'A')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Obviously, &amp;lt;code&amp;gt;A&amp;lt;/code&amp;gt;=atom, &amp;lt;code&amp;gt;R&amp;lt;/code&amp;gt;=residue, &amp;lt;code&amp;gt;C&amp;lt;/code&amp;gt;=chain, &amp;lt;code&amp;gt;M&amp;lt;/code&amp;gt;=model, &amp;lt;code&amp;gt;S&amp;lt;/code&amp;gt;=structure.&lt;br /&gt;
You can use this to go up in the hierarchy, e.g. to get a list of (unique) &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;Chain&amp;lt;/code&amp;gt; parents from a list of&lt;br /&gt;
&amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;s:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
residue_list = Selection.unfold_entities(atom_list, 'R')&lt;br /&gt;
chain_list = Selection.unfold_entities(atom_list, 'C')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
For more info, see the API documentation.&lt;br /&gt;
&lt;br /&gt;
==== How do I extract a specific &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;Chain&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;Model&amp;lt;/code&amp;gt; from a &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt;? ====&lt;br /&gt;
&lt;br /&gt;
Easy. Here are some examples:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
model = structure[0]&lt;br /&gt;
chain = model['A']&lt;br /&gt;
residue = chain[100]&lt;br /&gt;
atom = residue['CA']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Note that you can use a shortcut:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
atom = structure[0]['A'][100]['CA']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== What is a model id? ====&lt;br /&gt;
&lt;br /&gt;
The model id is an integer which denotes the rank of the model in&lt;br /&gt;
the PDB/mmCIF file. The model is starts at 0. Crystal structures generally&lt;br /&gt;
have only one model (with id 0), while NMR files usually have several&lt;br /&gt;
models.&lt;br /&gt;
&lt;br /&gt;
==== What is a chain id? ====&lt;br /&gt;
&lt;br /&gt;
The chain id is specified in the PDB/mmCIF file, and is a single character&lt;br /&gt;
(typically a letter). &lt;br /&gt;
&lt;br /&gt;
==== What is a residue id? ====&lt;br /&gt;
&lt;br /&gt;
This is a bit more complicated, due to the clumsy PDB format. A residue&lt;br /&gt;
id is a tuple with three elements:&lt;br /&gt;
* The '''hetero-flag''': this is &amp;lt;code&amp;gt;'H_'&amp;lt;/code&amp;gt; plus the name of the hetero-residue (e.g. &amp;lt;code&amp;gt;'H_GLC'&amp;lt;/code&amp;gt; in the case of a glucose molecule), or &amp;lt;code&amp;gt;'W'&amp;lt;/code&amp;gt; in the case of a water molecule.&lt;br /&gt;
* The '''sequence identifier''' in the chain, e.g. 100&lt;br /&gt;
* The '''insertion code''', e.g. &amp;lt;code&amp;gt;'A'&amp;lt;/code&amp;gt;. The insertion code is sometimes used to preserve a certain desirable residue numbering scheme. A Ser 80 insertion mutant (inserted e.g. between a Thr 80 and an Asn 81 residue) could e.g. have sequence identifiers and insertion codes as follows: Thr 80 A, Ser 80 B, Asn 81. In this way the residue numbering scheme stays in tune with that of the wild type structure.&lt;br /&gt;
&lt;br /&gt;
The id of the above glucose residue would thus be &amp;lt;code&amp;gt;('H_GLC', 100, 'A')&amp;lt;/code&amp;gt;. If the hetero-flag and insertion code are blank, the sequence&lt;br /&gt;
identifier alone can be used:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Full id&lt;br /&gt;
residue = chain[(' ', 100, ' ')]&lt;br /&gt;
# Shortcut id&lt;br /&gt;
residue = chain[100]&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
The reason for the hetero-flag is that many, many PDB files use the same sequence identifier for an amino acid and a hetero-residue or a water, which would create obvious problems if the hetero-flag was not used.&lt;br /&gt;
&lt;br /&gt;
==== What is an atom id? ====&lt;br /&gt;
&lt;br /&gt;
The atom id is simply the atom name (eg. &amp;lt;code&amp;gt;'CA'&amp;lt;/code&amp;gt;). In practice,&lt;br /&gt;
the atom name is created by stripping all spaces from the atom name&lt;br /&gt;
in the PDB file. &lt;br /&gt;
&lt;br /&gt;
However, in PDB files, a space can be part of an atom name. Often,&lt;br /&gt;
calcium atoms are called &amp;lt;code&amp;gt;'CA..'&amp;lt;/code&amp;gt; in order to distinguish them&lt;br /&gt;
from C&amp;amp;alpha; atoms (which are called &amp;lt;code&amp;gt;'.CA.'&amp;lt;/code&amp;gt;). In cases&lt;br /&gt;
were stripping the spaces would create problems (ie. two atoms called&lt;br /&gt;
&amp;lt;code&amp;gt;'CA'&amp;lt;/code&amp;gt; in the same residue) the spaces are kept.&lt;br /&gt;
&lt;br /&gt;
==== How is disorder handled? ====&lt;br /&gt;
&lt;br /&gt;
This is one of the strong points of Bio.PDB. It can handle both disordered&lt;br /&gt;
atoms and point mutations (ie. a Gly and an Ala residue in the same&lt;br /&gt;
position). &lt;br /&gt;
&lt;br /&gt;
Disorder should be dealt with from two points of view: the atom and&lt;br /&gt;
the residue points of view. In general, I have tried to encapsulate&lt;br /&gt;
all the complexity that arises from disorder. If you just want to&lt;br /&gt;
loop over all C&amp;amp;alpha; atoms, you do not care that some residues&lt;br /&gt;
have a disordered side chain. On the other hand it should also be&lt;br /&gt;
possible to represent disorder completely in the data structure. Therefore,&lt;br /&gt;
disordered atoms or residues are stored in special objects that behave&lt;br /&gt;
as if there is no disorder. This is done by only representing a subset&lt;br /&gt;
of the disordered atoms or residues. Which subset is picked (e.g.&lt;br /&gt;
which of the two disordered OG side chain atom positions of a Ser&lt;br /&gt;
residue is used) can be specified by the user.&lt;br /&gt;
&lt;br /&gt;
'''Disordered atom positions''' are represented by ordinary &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;&lt;br /&gt;
objects, but all &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects that represent the same physical&lt;br /&gt;
atom are stored in a &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; object (see section [[#The Structure object]]).&lt;br /&gt;
Each &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object in a &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; object can&lt;br /&gt;
be uniquely indexed using its altloc specifier. The &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt;&lt;br /&gt;
object forwards all uncaught method calls to the selected Atom object,&lt;br /&gt;
by default the one that represents the atom with the highest&lt;br /&gt;
occupancy. The user can of course change the selected &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;&lt;br /&gt;
object, making use of its altloc specifier. In this way atom disorder&lt;br /&gt;
is represented correctly without much additional complexity. In other&lt;br /&gt;
words, if you are not interested in atom disorder, you will not be&lt;br /&gt;
bothered by it.&lt;br /&gt;
&lt;br /&gt;
Each disordered atom has a characteristic altloc identifier. You can&lt;br /&gt;
specify that a &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; object should behave like&lt;br /&gt;
the &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object associated with a specific altloc identifier:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
atom.disordered_select('A')  # select altloc A atom&lt;br /&gt;
atom.disordered_select('B')  # select altloc B atom &lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
A special case arises when disorder is due to '''point mutations''',&lt;br /&gt;
i.e. when two or more point mutants of a polypeptide are present in&lt;br /&gt;
the crystal. An example of this can be found in PDB structure 1EN2.&lt;br /&gt;
&lt;br /&gt;
Since these residues belong to a different residue type (e.g. let's&lt;br /&gt;
say Ser 60 and Cys 60) they should not be stored in a single &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt;&lt;br /&gt;
object as in the common case. In this case, each residue is represented&lt;br /&gt;
by one &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object, and both &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; objects&lt;br /&gt;
are stored in a single &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt; object (see section [[#The Structure object]]).&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt; object forwards all uncaught&lt;br /&gt;
methods to the selected &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object (by default the last&lt;br /&gt;
&amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object added), and thus behaves like an ordinary&lt;br /&gt;
residue. Each &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object in a &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt;&lt;br /&gt;
object can be uniquely identified by its residue name. In the above&lt;br /&gt;
example, residue Ser 60 would have id 'SER' in the &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt;&lt;br /&gt;
object, while residue Cys 60 would have id 'CYS'. The user can select&lt;br /&gt;
the active &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object in a &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt;&lt;br /&gt;
object via this id.&lt;br /&gt;
&lt;br /&gt;
Example: suppose that a chain has a point mutation at position 10,&lt;br /&gt;
consisting of a Ser and a Cys residue. Make sure that residue 10 of&lt;br /&gt;
this chain behaves as the Cys residue.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
residue = chain[10]&lt;br /&gt;
residue.disordered_select('CYS')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
In addition, you can get a list of all &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects (ie.&lt;br /&gt;
all &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; objects are 'unpacked' to their individual&lt;br /&gt;
&amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects) using the &amp;lt;code&amp;gt;get_unpacked_list&amp;lt;/code&amp;gt; method&lt;br /&gt;
of a (&amp;lt;code&amp;gt;Disordered&amp;lt;/code&amp;gt;)&amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object.&lt;br /&gt;
&lt;br /&gt;
==== Can I sort residues in a chain somehow? ====&lt;br /&gt;
&lt;br /&gt;
Yes, kinda, but I'm waiting for a request for this feature to finish&lt;br /&gt;
it :-).&lt;br /&gt;
&lt;br /&gt;
==== How are ligands and solvent handled? ====&lt;br /&gt;
&lt;br /&gt;
See 'What is a residue id?'.&lt;br /&gt;
&lt;br /&gt;
==== What about B factors? ====&lt;br /&gt;
&lt;br /&gt;
Well, yes! Bio.PDB supports isotropic and anisotropic B factors, and&lt;br /&gt;
also deals with standard deviations of anisotropic B factor if present&lt;br /&gt;
(see the section [[#Analysis]]).&lt;br /&gt;
&lt;br /&gt;
==== What about standard deviation of atomic positions? ====&lt;br /&gt;
&lt;br /&gt;
Yup, supported. See the section [[#Analysis]].&lt;br /&gt;
&lt;br /&gt;
==== I think the SMCRA data structure is not flexible/sexy/whatever enough... ====&lt;br /&gt;
&lt;br /&gt;
Sure, sure. Everybody is always coming up with (mostly vaporware or&lt;br /&gt;
partly implemented) data structures that handle all possible situations&lt;br /&gt;
and are extensible in all thinkable (and unthinkable) ways. The prosaic&lt;br /&gt;
truth however is that 99.9\% of people using (and I mean really using!)&lt;br /&gt;
crystal structures think in terms of models, chains, residues and&lt;br /&gt;
atoms. The philosophy of Bio.PDB is to provide a reasonably fast,&lt;br /&gt;
clean, simple, but complete data structure to access structure data.&lt;br /&gt;
The proof of the pudding is in the eating.&lt;br /&gt;
&lt;br /&gt;
Moreover, it is quite easy to build more specialised data structures&lt;br /&gt;
on top of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; class (eg. there's a &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt;&lt;br /&gt;
class). On the other hand, the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object is built&lt;br /&gt;
using a Parser/Consumer approach (called &amp;lt;code&amp;gt;PDBParser&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;MMCIFParser&amp;lt;/code&amp;gt;&lt;br /&gt;
and &amp;lt;code&amp;gt;StructureBuilder&amp;lt;/code&amp;gt;, respectively). One can easily reuse&lt;br /&gt;
the PDB/mmCIF parsers by implementing a specialised &amp;lt;code&amp;gt;StructureBuilder&amp;lt;/code&amp;gt;&lt;br /&gt;
class. It is of course also trivial to add support for new file formats&lt;br /&gt;
by writing new parsers.&lt;br /&gt;
&lt;br /&gt;
=== Analysis ===&lt;br /&gt;
&lt;br /&gt;
==== How do I extract information from an &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object? ====&lt;br /&gt;
&lt;br /&gt;
Using the following methods:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
a.get_name()           # atom name (spaces stripped, e.g. 'CA')&lt;br /&gt;
a.get_id()             # id (equals atom name)&lt;br /&gt;
a.get_coord()          # atomic coordinates&lt;br /&gt;
a.get_vector()         # atomic coordinates as Vector object&lt;br /&gt;
a.get_bfactor()        # isotropic B factor&lt;br /&gt;
a.get_occupancy()      # occupancy&lt;br /&gt;
a.get_altloc()         # alternative location specifier&lt;br /&gt;
a.get_sigatm()         # std. dev. of atomic parameters&lt;br /&gt;
a.get_siguij()         # std. dev. of anisotropic B factor&lt;br /&gt;
a.get_anisou()         # anisotropic B factor&lt;br /&gt;
a.get_fullname()       # atom name (with spaces, e.g. '.CA.')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I extract information from a &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object? ====&lt;br /&gt;
&lt;br /&gt;
Using the following methods:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
r.get_resname()         # return the residue name (eg. 'GLY')&lt;br /&gt;
r.is_disordered()       # 1 if the residue has disordered atoms&lt;br /&gt;
r.get_segid()           # return the SEGID&lt;br /&gt;
r.has_id(name)          # test if a residue has a certain atom&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I measure distances? ====&lt;br /&gt;
&lt;br /&gt;
That's simple: the minus operator for atoms has been overloaded to&lt;br /&gt;
return the distance between two atoms. &lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Get some atoms&lt;br /&gt;
ca1 = residue1['CA']&lt;br /&gt;
ca2 = residue2['CA']&lt;br /&gt;
# Simply subtract the atoms to get their distance&lt;br /&gt;
distance = ca1-ca2&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I measure angles? ====&lt;br /&gt;
&lt;br /&gt;
This can easily be done via the vector representation of the atomic coordinates, and the &amp;lt;code&amp;gt;calc_angle&amp;lt;/code&amp;gt; function from the &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; module:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
vector1 = atom1.get_vector()&lt;br /&gt;
vector2 = atom2.get_vector()&lt;br /&gt;
vector3 = atom3.get_vector()&lt;br /&gt;
angle = calc_angle(vector1, vector2, vector3)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I measure torsion angles? ====&lt;br /&gt;
&lt;br /&gt;
Again, this can easily be done via the vector representation of the&lt;br /&gt;
atomic coordinates, this time using the &amp;lt;code&amp;gt;calc_dihedral&amp;lt;/code&amp;gt; function&lt;br /&gt;
from the &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; module:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
vector1 = atom1.get_vector()&lt;br /&gt;
vector2 = atom2.get_vector()&lt;br /&gt;
vector3 = atom3.get_vector()&lt;br /&gt;
vector4 = atom4.get_vector()&lt;br /&gt;
angle = calc_dihedral(vector1, vector2, vector3, vector4)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I determine atom-atom contacts? ====&lt;br /&gt;
&lt;br /&gt;
Use &amp;lt;code&amp;gt;NeighborSearch&amp;lt;/code&amp;gt;. This uses a KD tree data structure coded&lt;br /&gt;
in C behind the screens, so it's pretty darn fast (see &amp;lt;code&amp;gt;Bio.KDTree&amp;lt;/code&amp;gt;).&lt;br /&gt;
&lt;br /&gt;
==== How do I extract polypeptides from a &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object? ====&lt;br /&gt;
&lt;br /&gt;
Use &amp;lt;code&amp;gt;PolypeptideBuilder&amp;lt;/code&amp;gt;. You can use the resulting &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt;&lt;br /&gt;
object to get the sequence as a &amp;lt;code&amp;gt;Seq&amp;lt;/code&amp;gt; object or to get a list&lt;br /&gt;
of C&amp;amp;alpha; atoms as well. Polypeptides can be built using a C-N&lt;br /&gt;
or a C&amp;amp;alpha;-C&amp;amp;alpha; distance criterion.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Using C-N &lt;br /&gt;
ppb=PPBuilder()&lt;br /&gt;
for pp in ppb.build_peptides(structure): &lt;br /&gt;
    print pp.get_sequence()&lt;br /&gt;
&lt;br /&gt;
# Using CA-CA&lt;br /&gt;
ppb=CaPPBuilder()&lt;br /&gt;
for pp in ppb.build_peptides(structure): &lt;br /&gt;
    print pp.get_sequence()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that in the above case only model 0 of the structure is considered&lt;br /&gt;
by &amp;lt;code&amp;gt;PolypeptideBuilder&amp;lt;/code&amp;gt;. However, it is possible to use &amp;lt;code&amp;gt;PolypeptideBuilder&amp;lt;/code&amp;gt;&lt;br /&gt;
to build &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt; objects from &amp;lt;code&amp;gt;Model&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;Chain&amp;lt;/code&amp;gt;&lt;br /&gt;
objects as well.&lt;br /&gt;
&lt;br /&gt;
==== How do I get the sequence of a structure? ====&lt;br /&gt;
&lt;br /&gt;
The first thing to do is to extract all polypeptides from the structure&lt;br /&gt;
(see previous entry). The sequence of each polypeptide can then easily&lt;br /&gt;
be obtained from the &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt; objects. The sequence is&lt;br /&gt;
represented as a Biopython &amp;lt;code&amp;gt;Seq&amp;lt;/code&amp;gt; object, and its alphabet is&lt;br /&gt;
defined by a &amp;lt;code&amp;gt;ProteinAlphabet&amp;lt;/code&amp;gt; object.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; seq = polypeptide.get_sequence()&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print seq&lt;br /&gt;
Seq('SNVVE...', &amp;lt;class Bio.Alphabet.ProteinAlphabet&amp;gt;)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I determine secondary structure? ====&lt;br /&gt;
&lt;br /&gt;
For this functionality, you need to install DSSP (and obtain a license&lt;br /&gt;
for it - free for academic use, see http://www.cmbi.kun.nl/gv/dssp/).&lt;br /&gt;
Then use the &amp;lt;code&amp;gt;DSSP&amp;lt;/code&amp;gt; class, which maps &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; objects&lt;br /&gt;
to their secondary structure (and accessible surface area). The DSSP&lt;br /&gt;
codes are listed in Table \ref{cap:DSSP-codes. Note that DSSP (the&lt;br /&gt;
program, and thus by consequence the class) cannot handle multiple&lt;br /&gt;
models!&lt;br /&gt;
&lt;br /&gt;
%&lt;br /&gt;
\begin{table&lt;br /&gt;
&lt;br /&gt;
==== \begin{tabular{|c|c|&lt;br /&gt;
\hline &lt;br /&gt;
Code&amp;amp;&lt;br /&gt;
Secondary structure\tabularnewline&lt;br /&gt;
\hline&lt;br /&gt;
\hline &lt;br /&gt;
H&amp;amp;&lt;br /&gt;
&amp;amp;alpha;-helix\tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
B&amp;amp;&lt;br /&gt;
Isolated &amp;amp;beta;-bridge residue\tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
E&amp;amp;&lt;br /&gt;
Strand \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
G&amp;amp;&lt;br /&gt;
3-10 helix \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
I&amp;amp;&lt;br /&gt;
$\Pi$-helix \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
T&amp;amp;&lt;br /&gt;
Turn\tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
S&amp;amp;&lt;br /&gt;
Bend \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
-&amp;amp;&lt;br /&gt;
Other\tabularnewline&lt;br /&gt;
\hline&lt;br /&gt;
\end{tabular&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
\caption{\label{cap:DSSP-codesDSSP codes in Bio.PDB.&lt;br /&gt;
\end{table&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==== How do I calculate the accessible surface area of a residue? ====&lt;br /&gt;
&lt;br /&gt;
Use the &amp;lt;code&amp;gt;DSSP&amp;lt;/code&amp;gt; class (see also previous entry). But see also&lt;br /&gt;
next entry.&lt;br /&gt;
&lt;br /&gt;
==== How do I calculate residue depth? ====&lt;br /&gt;
&lt;br /&gt;
Residue depth is the average distance of a residue's atoms from the&lt;br /&gt;
solvent accessible surface. It's a fairly new and very powerful parameterization&lt;br /&gt;
of solvent accessibility. For this functionality, you need to install&lt;br /&gt;
Michel Sanner's MSMS program (http://www.scripps.edu/pub/olson-web/people/sanner/html/msms_home.html).&lt;br /&gt;
Then use the &amp;lt;code&amp;gt;ResidueDepth&amp;lt;/code&amp;gt; class. This class behaves as a&lt;br /&gt;
dictionary which maps &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; objects to corresponding (residue&lt;br /&gt;
depth, C&amp;amp;alpha; depth) tuples. The C&amp;amp;alpha; depth is the distance&lt;br /&gt;
of a residue's C&amp;amp;alpha; atom to the solvent accessible surface. &lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
model = structure[0]&lt;br /&gt;
rd = ResidueDepth(model, pdb_file)&lt;br /&gt;
residue_depth, ca_depth = rd[some_residue]&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
You can also get access to the molecular surface itself (via the &amp;lt;code&amp;gt;get_surface&amp;lt;/code&amp;gt;&lt;br /&gt;
function), in the form of a Numeric python array with the surface points.&lt;br /&gt;
&lt;br /&gt;
==== How do I calculate Half Sphere Exposure? ====&lt;br /&gt;
&lt;br /&gt;
Half Sphere Exposure (HSE) is a new, 2D measure of solvent exposure.&lt;br /&gt;
Basically, it counts the number of C&amp;amp;alpha; atoms around a residue&lt;br /&gt;
in the direction of its side chain, and in the opposite direction&lt;br /&gt;
(within a radius of 13 Å). Despite its simplicity, it outperforms&lt;br /&gt;
many other measures of solvent exposure. An article describing this&lt;br /&gt;
novel 2D measure has been submitted.&lt;br /&gt;
&lt;br /&gt;
HSE comes in two flavors: HSE&amp;amp;alpha; and HSE&amp;amp;beta;. The former&lt;br /&gt;
only uses the C&amp;amp;alpha; atom positions, while the latter uses the&lt;br /&gt;
C&amp;amp;alpha; and C&amp;amp;beta; atom positions. The HSE measure is calculated&lt;br /&gt;
by the &amp;lt;code&amp;gt;HSExposure&amp;lt;/code&amp;gt; class, which can also calculate the contact&lt;br /&gt;
number. The latter class has methods which return dictionaries that&lt;br /&gt;
map a &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object to its corresponding HSE&amp;amp;alpha;, HSE&amp;amp;beta;&lt;br /&gt;
and contact number values.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
model = structure[0]&lt;br /&gt;
hse = HSExposure()&lt;br /&gt;
# Calculate HSEalpha&lt;br /&gt;
exp_ca = hse.calc_hs_exposure(model, option='CA3')&lt;br /&gt;
# Calculate HSEbeta&lt;br /&gt;
exp_cb = hse.calc_hs_exposure(model, option='CB')&lt;br /&gt;
# Calculate classical coordination number&lt;br /&gt;
exp_fs = hse.calc_fs_exposure(model)&lt;br /&gt;
# Print HSEalpha for a residue&lt;br /&gt;
print exp_ca[some_residue]&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I map the residues of two related structures onto each other? ====&lt;br /&gt;
&lt;br /&gt;
First, create an alignment file in FASTA format, then use the &amp;lt;code&amp;gt;StructureAlignment&amp;lt;/code&amp;gt;&lt;br /&gt;
class. This class can also be used for alignments with more than two&lt;br /&gt;
structures.&lt;br /&gt;
&lt;br /&gt;
==== How do I test if a Residue object is an amino acid? ====&lt;br /&gt;
&lt;br /&gt;
Use &amp;lt;code&amp;gt;is_aa(residue)&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
==== Can I do vector operations on atomic coordinates? ====&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects return a &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; object representation&lt;br /&gt;
of the coordinates with the &amp;lt;code&amp;gt;get_vector&amp;lt;/code&amp;gt; method. &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt;&lt;br /&gt;
implements the full set of 3D vector operations, matrix multiplication&lt;br /&gt;
(left and right) and some advanced rotation-related operations as&lt;br /&gt;
well. See also next question.&lt;br /&gt;
&lt;br /&gt;
==== How do I put a virtual C&amp;amp;beta; on a Gly residue? ====&lt;br /&gt;
&lt;br /&gt;
OK, I admit, this example is only present to show off the possibilities&lt;br /&gt;
of Bio.PDB's &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; module (though this code is actually&lt;br /&gt;
used in the &amp;lt;code&amp;gt;HSExposure&amp;lt;/code&amp;gt; module, which contains a novel way&lt;br /&gt;
to parametrize residue exposure - publication underway). Suppose that&lt;br /&gt;
you would like to find the position of a Gly residue's C&amp;amp;beta; atom,&lt;br /&gt;
if it had one. How would you do that? Well, rotating the N atom of&lt;br /&gt;
the Gly residue along the C&amp;amp;alpha;-C bond over -120 degrees roughly&lt;br /&gt;
puts it in the position of a virtual C&amp;amp;beta; atom. Here's how to&lt;br /&gt;
do it, making use of the &amp;lt;code&amp;gt;rotaxis&amp;lt;/code&amp;gt; method (which can be used&lt;br /&gt;
to construct a rotation around a certain axis) of the &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt;&lt;br /&gt;
module:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# get atom coordinates as vectors&lt;br /&gt;
n = residue['N'].get_vector() &lt;br /&gt;
c = residue['C'].get_vector() &lt;br /&gt;
ca = residue['CA'].get_vector()&lt;br /&gt;
# center at origin&lt;br /&gt;
n = n - ca&lt;br /&gt;
c = c - ca&lt;br /&gt;
# find rotation matrix that rotates n -120 degrees along the ca-c vector&lt;br /&gt;
rot = rotaxis(-pi{*120.0/180.0, c)&lt;br /&gt;
# apply rotation to ca-n vector&lt;br /&gt;
cb_at_origin = n.left_multiply(rot)&lt;br /&gt;
# put on top of ca atom&lt;br /&gt;
cb = cb_at_origin + ca&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
This example shows that it's possible to do some quite nontrivial&lt;br /&gt;
vector operations on atomic data, which can be quite useful. In addition&lt;br /&gt;
to all the usual vector operations (cross (use &amp;lt;code&amp;gt;**&amp;lt;/code&amp;gt;), and&lt;br /&gt;
dot (use &amp;lt;code&amp;gt;*&amp;lt;/code&amp;gt;) product, angle, norm, etc.) and the above mentioned&lt;br /&gt;
&amp;lt;code&amp;gt;rotaxis function, the &amp;lt;code&amp;gt;Vector module also has methods&lt;br /&gt;
to rotate (&amp;lt;code&amp;gt;rotmat&amp;lt;/code&amp;gt;) or reflect (&amp;lt;code&amp;gt;refmat&amp;lt;/code&amp;gt;) one vector&lt;br /&gt;
on top of another.&lt;br /&gt;
&lt;br /&gt;
=== Manipulating the structure ===&lt;br /&gt;
&lt;br /&gt;
==== How do I superimpose two structures? ====&lt;br /&gt;
&lt;br /&gt;
Surprisingly, this is done using the &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; object.&lt;br /&gt;
This object calculates the rotation and translation matrix that rotates&lt;br /&gt;
two lists of atoms on top of each other in such a way that their RMSD&lt;br /&gt;
is minimized. Of course, the two lists need to contain the same amount&lt;br /&gt;
of atoms. The &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; object can also apply the rotation/translation&lt;br /&gt;
to a list of atoms. The rotation and translation are stored as a tuple&lt;br /&gt;
in the &amp;lt;code&amp;gt;rotran&amp;lt;/code&amp;gt; attribute of the &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; object&lt;br /&gt;
(note that the rotation is right multiplying!). The RMSD is stored&lt;br /&gt;
in the &amp;lt;code&amp;gt;rmsd&amp;lt;/code&amp;gt; attribute.&lt;br /&gt;
&lt;br /&gt;
The algorithm used by &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; comes from \textit{Matrix&lt;br /&gt;
computations, 2nd ed. Golub, G. \&amp;amp; Van Loan (1989) and makes use&lt;br /&gt;
of singular value decomposition (this is implemented in the general&lt;br /&gt;
&amp;lt;code&amp;gt;Bio.SVDSuperimposer&amp;lt;/code&amp;gt; module).&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
sup = Superimposer()&lt;br /&gt;
# Specify the atom lists&lt;br /&gt;
# 'fixed' and 'moving' are lists of Atom objects&lt;br /&gt;
# The moving atoms will be put on the fixed atoms&lt;br /&gt;
sup.set_atoms(fixed, moving)&lt;br /&gt;
# Print rotation/translation/rmsd&lt;br /&gt;
print sup.rotran&lt;br /&gt;
print sup.rms &lt;br /&gt;
# Apply rotation/translation to the moving atoms&lt;br /&gt;
sup.apply(moving)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I superimpose two structures based on their active sites? ====&lt;br /&gt;
&lt;br /&gt;
Pretty easily. Use the active site atoms to calculate the rotation/translation&lt;br /&gt;
matrices (see above), and apply these to the whole molecule.&lt;br /&gt;
&lt;br /&gt;
==== Can I manipulate the atomic coordinates? ====&lt;br /&gt;
&lt;br /&gt;
Yes, using the &amp;lt;code&amp;gt;transform&amp;lt;/code&amp;gt; method of the &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object,&lt;br /&gt;
or directly using the &amp;lt;code&amp;gt;set_coord&amp;lt;/code&amp;gt; method.&lt;br /&gt;
&lt;br /&gt;
== Other Structural Bioinformatics modules ==&lt;br /&gt;
&lt;br /&gt;
==== Bio.SCOP ====&lt;br /&gt;
&lt;br /&gt;
See the main Biopython tutorial.&lt;br /&gt;
&lt;br /&gt;
==== Bio.FSSP ====&lt;br /&gt;
&lt;br /&gt;
No documentation available yet.&lt;br /&gt;
&lt;br /&gt;
== You haven't answered my question yet! ==&lt;br /&gt;
&lt;br /&gt;
Woah! It's late and I'm tired, and a glass of excellent ''Pedro Ximenez'' sherry is waiting for me. Just drop me a mail, and I'll answer&lt;br /&gt;
you in the morning (with a bit of luck...).&lt;br /&gt;
&lt;br /&gt;
== Contributors ==&lt;br /&gt;
&lt;br /&gt;
The main author/maintainer of Bio.PDB is:&lt;br /&gt;
&lt;br /&gt;
 Thomas Hamelryck&lt;br /&gt;
 Bioinformatics center&lt;br /&gt;
 Institute of Molecular Biology&lt;br /&gt;
 University of Copenhagen&lt;br /&gt;
 Universitetsparken 15, Bygning 10&lt;br /&gt;
 DK-2100 København Ø&lt;br /&gt;
 Denmark&lt;br /&gt;
 thamelry@binf.ku.dk&lt;br /&gt;
&lt;br /&gt;
Kristian Rother donated code to interact with the PDB database, and to parse the PDB&lt;br /&gt;
header. Indraneel Majumdar sent in some bug reports and assisted in&lt;br /&gt;
coding the &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt; module. Many thanks to Brad Chapman,&lt;br /&gt;
Jeffrey Chang, Andrew Dalke and Iddo Friedberg for suggestions, comments,&lt;br /&gt;
help and/or biting criticism :-).&lt;br /&gt;
&lt;br /&gt;
== Can I contribute? ==&lt;br /&gt;
&lt;br /&gt;
Yes, yes, yes! Just send an e-mail to [mailto:thamelry@binf.ku.dk Thomas Hamelryck] or to [mailto:biopython-dev@biopython.org the Biopython developers] if you have something useful to contribute! Eternal fame awaits!&lt;/div&gt;</summary>
		<author><name>Mdehoon</name></author>	</entry>

	<entry>
		<id>http://www.biopython.org/wiki/The_Biopython_Structural_Bioinformatics_FAQ</id>
		<title>The Biopython Structural Bioinformatics FAQ</title>
		<link rel="alternate" type="text/html" href="http://www.biopython.org/wiki/The_Biopython_Structural_Bioinformatics_FAQ"/>
				<updated>2013-03-26T08:39:20Z</updated>
		
		<summary type="html">&lt;p&gt;Mdehoon: /* Who's using Bio.PDB? */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Introduction ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB is a biopython module that focuses on working with crystal&lt;br /&gt;
structures of biological macromolecules. This document gives a fairly&lt;br /&gt;
complete overview of Bio.PDB.&lt;br /&gt;
&lt;br /&gt;
== Bio.PDB's installation ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB is automatically installed as part of Biopython. Biopython&lt;br /&gt;
can be obtained from http://www.biopython.org. It runs on many&lt;br /&gt;
platforms (Linux/Unix, windows, Mac,...).&lt;br /&gt;
&lt;br /&gt;
== Who's using Bio.PDB? ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB was used in the construction of DISEMBL, a web server that&lt;br /&gt;
predicts disordered regions in proteins (http://dis.embl.de/),&lt;br /&gt;
and COLUMBA, a website that provides annotated protein structures&lt;br /&gt;
(http://www.columba-db.de/). Bio.PDB has also been used to&lt;br /&gt;
perform a large scale search for active sites similarities between&lt;br /&gt;
protein structures in the PDB (see [http://dx.doi.org/10.1002/prot.10338 ''Proteins'' '''51''': 96&amp;amp;ndash;108 (2003)]),&lt;br /&gt;
and to develop a new algorithm that identifies linear secondary structure elements (see [http://www.biomedcentral.com/1471-2105/6/202 ''BMC Bioinformatics'' '''6''': 202 (2005)]).&lt;br /&gt;
&lt;br /&gt;
Judging from requests for features and information, Bio.PDB is also&lt;br /&gt;
used by several LPCs (Large Pharmaceutical Companies :-).&lt;br /&gt;
&lt;br /&gt;
== Is there a Bio.PDB reference? ==&lt;br /&gt;
&lt;br /&gt;
Yes, and I'd appreciate it if you would refer to Bio.PDB in publications&lt;br /&gt;
if you make use of it. The reference is:&lt;br /&gt;
&lt;br /&gt;
\begin{quote&lt;br /&gt;
Hamelryck, T., Manderick, B. (2003) PDB parser and structure class&lt;br /&gt;
implemented in Python. \textit{Bioinformatics, \textbf{19, 2308-2310. &lt;br /&gt;
\end{quote&lt;br /&gt;
The article can be freely downloaded via the Bioinformatics journal&lt;br /&gt;
website (http://www.binf.ku.dk/users/thamelry/references.html).&lt;br /&gt;
I welcome e-mails telling me what you are using Bio.PDB for. Feature&lt;br /&gt;
requests are welcome too.&lt;br /&gt;
&lt;br /&gt;
== How well tested is Bio.PDB? ==&lt;br /&gt;
&lt;br /&gt;
Pretty well, actually. Bio.PDB has been extensively tested on nearly&lt;br /&gt;
5500 structures from the PDB - all structures seemed to be parsed&lt;br /&gt;
correctly. More details can be found in the Bio.PDB Bioinformatics&lt;br /&gt;
article. Bio.PDB has been used/is being used in many research projects&lt;br /&gt;
as a reliable tool. In fact, I'm using Bio.PDB almost daily for research&lt;br /&gt;
purposes and continue working on improving it and adding new features.&lt;br /&gt;
&lt;br /&gt;
== How fast is it? ==&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;PDBParser&amp;lt;/code&amp;gt; performance was tested on about 800 structures&lt;br /&gt;
(each belonging to a unique SCOP superfamily). This takes about 20&lt;br /&gt;
minutes, or on average 1.5 seconds per structure. Parsing the structure&lt;br /&gt;
of the large ribosomal subunit (1FKK), which contains about 64000&lt;br /&gt;
atoms, takes 10 seconds on a 1000 MHz PC. In short: it's more than&lt;br /&gt;
fast enough for many applications.&lt;br /&gt;
&lt;br /&gt;
== Why should I use Bio.PDB? ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB might be exactly what you want, and then again it might not.&lt;br /&gt;
If you are interested in data mining the PDB header, you might want&lt;br /&gt;
to look elsewhere because there is only limited support for this.&lt;br /&gt;
If you look for a powerful, complete data structure to access the&lt;br /&gt;
atomic data Bio.PDB is probably for you. &lt;br /&gt;
&lt;br /&gt;
== Usage ==&lt;br /&gt;
&lt;br /&gt;
=== General questions ===&lt;br /&gt;
&lt;br /&gt;
==== Importing Bio.PDB ====&lt;br /&gt;
&lt;br /&gt;
That's simple:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
from Bio.PDB import *&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Is there support for molecular graphics? ====&lt;br /&gt;
&lt;br /&gt;
Not directly, mostly since there are quite a few Python based/Python&lt;br /&gt;
aware solutions already, that can potentially be used with Bio.PDB.&lt;br /&gt;
My choice is Pymol, BTW (I've used this successfully with Bio.PDB,&lt;br /&gt;
and there will probably be specific PyMol modules in Bio.PDB soon/some&lt;br /&gt;
day). Python based/aware molecular graphics solutions include:&lt;br /&gt;
&lt;br /&gt;
* PyMol: http://pymol.sourceforge.net/&lt;br /&gt;
* Chimera: http://www.cgl.ucsf.edu/chimera/&lt;br /&gt;
* PMV: http://www.scripps.edu/ sanner/python/&lt;br /&gt;
* Coot: http://www.ysbl.york.ac.uk/ emsley/coot/&lt;br /&gt;
* CCP4mg: http://www.ysbl.york.ac.uk/ lizp/molgraphics.html&lt;br /&gt;
* mmLib: http://pymmlib.sourceforge.net/ &lt;br /&gt;
* VMD: http://www.ks.uiuc.edu/Research/vmd/&lt;br /&gt;
* MMTK: http://starship.python.net/crew/hinsen/MMTK/&lt;br /&gt;
&lt;br /&gt;
I'd be crazy to write another molecular graphics application (been&lt;br /&gt;
there - done that, actually :-).&lt;br /&gt;
&lt;br /&gt;
=== Input/output ===&lt;br /&gt;
&lt;br /&gt;
==== How do I create a structure object from a PDB file? ====&lt;br /&gt;
&lt;br /&gt;
First, create a &amp;lt;code&amp;gt;PDBParser&amp;lt;/code&amp;gt; object:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
parser = PDBParser()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then, create a structure object from a PDB file in the following way&lt;br /&gt;
(the PDB file in this case is called '1FAT.pdb', 'PHA-L' is a user&lt;br /&gt;
defined name for the structure):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
structure = parser.get_structure('PHA-L', '1FAT.pdb')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I create a structure object from an mmCIF file? ====&lt;br /&gt;
&lt;br /&gt;
Similarly to the case the case of PDB files, first create an &amp;lt;code&amp;gt;MMCIFParser&amp;lt;/code&amp;gt;&lt;br /&gt;
object:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
parser = MMCIFParser()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Then use this parser to create a structure object from the mmCIF file:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
structure = parser.get_structure('PHA-L', '1FAT.cif')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== ...and what about the new PDB XML format? ====&lt;br /&gt;
&lt;br /&gt;
That's not yet supported, but I'm definitely planning to support that&lt;br /&gt;
in the future (it's not a lot of work). Contact me if you need this,&lt;br /&gt;
it might encourage me :-).&lt;br /&gt;
&lt;br /&gt;
==== I'd like to have some more low level access to an mmCIF file... ====&lt;br /&gt;
&lt;br /&gt;
You got it. You can create a python dictionary that maps all mmCIF&lt;br /&gt;
tags in an mmCIF file to their values. If there are multiple values&lt;br /&gt;
(like in the case of tag &amp;lt;code&amp;gt;_atom_site.Cartn_y&amp;lt;/code&amp;gt;, which holds&lt;br /&gt;
the ''y'' coordinates of all atoms), the tag is mapped to a list of values.&lt;br /&gt;
The dictionary is created from the mmCIF file as follows:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
mmcif_dict = MMCIF2Dict('1FAT.cif')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example: get the solvent content from an mmCIF file:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
sc = mmcif_dict['_exptl_crystal.density_percent_sol']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example: get the list of the ''y'' coordinates of all atoms&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
y_list = mmcif_dict['_atom_site.Cartn_y']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Can I access the header information? ====&lt;br /&gt;
&lt;br /&gt;
Thanks to Christian Rother you can access some information from the&lt;br /&gt;
PDB header. Note however that many PDB files contain headers with&lt;br /&gt;
incomplete or erroneous information. Many of the errors have been&lt;br /&gt;
fixed in the equivalent mmCIF files.&lt;br /&gt;
'''Hence, if you are interested in the header information, it is a good idea to extract information from mmCIF files using the &amp;lt;code&amp;gt;MMCIF2Dict&amp;lt;/code&amp;gt; tool described above, instead of parsing the PDB header.''' &lt;br /&gt;
&lt;br /&gt;
Now that is clarified, let's return to parsing the PDB header. The&lt;br /&gt;
structure object has an attribute called &amp;lt;code&amp;gt;header&amp;lt;/code&amp;gt; which is&lt;br /&gt;
a python dictionary that maps header records to their values.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
resolution = structure.header['resolution']&lt;br /&gt;
keywords = structure.header['keywords']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The available keys are &amp;lt;code&amp;gt;name&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;head&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;deposition_date&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;release_date&amp;lt;/code&amp;gt;,&lt;br /&gt;
&amp;lt;code&amp;gt;structure_method&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;resolution&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;structure_reference&amp;lt;/code&amp;gt; (maps to&lt;br /&gt;
a list of references), &amp;lt;code&amp;gt;journal_reference&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;author&amp;lt;/code&amp;gt; and&lt;br /&gt;
&amp;lt;code&amp;gt;compound&amp;lt;/code&amp;gt; (maps to a dictionary with various information about&lt;br /&gt;
the crystallized compound).&lt;br /&gt;
&lt;br /&gt;
The dictionary can also be created without creating a &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt;&lt;br /&gt;
object, ie. directly from the PDB file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
handle = open(filename,'r')&lt;br /&gt;
header_dict = parse_pdb_header(handle)&lt;br /&gt;
handle.close()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Can I use Bio.PDB with NMR structures (ie. with more than one model)? ====&lt;br /&gt;
&lt;br /&gt;
Sure. Many PDB parsers assume that there is only one model, making&lt;br /&gt;
them all but useless for NMR structures. The design of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt;&lt;br /&gt;
object makes it easy to handle PDB files with more than one model&lt;br /&gt;
(see section [[#The Structure object]]). &lt;br /&gt;
&lt;br /&gt;
==== How do I download structures from the PDB? ====&lt;br /&gt;
&lt;br /&gt;
This can be done using the &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object, using the &amp;lt;code&amp;gt;retrieve_pdb_file&amp;lt;/code&amp;gt;&lt;br /&gt;
method. The argument for this method is the PDB identifier of the&lt;br /&gt;
structure.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
pdbl = PDBList()&lt;br /&gt;
pdbl.retrieve_pdb_file('1FAT')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; class can also be used as a command-line tool: &lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
python PDBList.py 1fat&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
The downloaded file will be called &amp;lt;code&amp;gt;pdb1fat.ent&amp;lt;/code&amp;gt; and stored&lt;br /&gt;
in the current working directory. Note that the &amp;lt;code&amp;gt;retrieve_pdb_file&amp;lt;/code&amp;gt;&lt;br /&gt;
method also has an optional argument &amp;lt;code&amp;gt;pdir&amp;lt;/code&amp;gt; that specifies&lt;br /&gt;
a specific directory in which to store the downloaded PDB files. &lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;retrieve_pdb_file&amp;lt;/code&amp;gt; method also has some options for specifying&lt;br /&gt;
the compression format used for the download, and the program used&lt;br /&gt;
for local decompression (default &amp;lt;code&amp;gt;.Z&amp;lt;/code&amp;gt; format and &amp;lt;code&amp;gt;gunzip&amp;lt;/code&amp;gt;).&lt;br /&gt;
In addition, the PDB ftp site can be specified upon creation of the&lt;br /&gt;
&amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object. By default, the server of the Worldwide Protein Data Bank (\url{ftp://ftp.wwpdb.org/pub/pdb/data/structures/divided/pdb/})&lt;br /&gt;
is used. See the API documentation for more details. Thanks again&lt;br /&gt;
to Kristian Rother for donating this module.&lt;br /&gt;
&lt;br /&gt;
==== How do I download the entire PDB? ====&lt;br /&gt;
&lt;br /&gt;
The following commands will store all PDB files in the &amp;lt;code&amp;gt;/data/pdb&amp;lt;/code&amp;gt;&lt;br /&gt;
directory: &lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
python PDBList.py all /data/pdb&lt;br /&gt;
python PDBList.py all /data/pdb -d&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
The API method for this is called &amp;lt;code&amp;gt;download_entire_pdb&amp;lt;/code&amp;gt;.&lt;br /&gt;
Adding the &amp;lt;code&amp;gt;-d&amp;lt;/code&amp;gt; option will store all files in the same directory.&lt;br /&gt;
Otherwise, they are sorted into PDB-style subdirectories according&lt;br /&gt;
to their PDB ID's. Depending on the traffic, a complete download will&lt;br /&gt;
take 2-4 days.&lt;br /&gt;
&lt;br /&gt;
==== How do I keep a local copy of the PDB up-to-date? ====&lt;br /&gt;
&lt;br /&gt;
This can also be done using the &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object. One simply&lt;br /&gt;
creates a &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object (specifying the directory where&lt;br /&gt;
the local copy of the PDB is present) and calls the &amp;lt;code&amp;gt;update_pdb&amp;lt;/code&amp;gt;&lt;br /&gt;
method:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
pl = PDBList(pdb='/data/pdb')&lt;br /&gt;
pl.update_pdb()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
One can of course make a weekly cronjob out of this to keep&lt;br /&gt;
the local copy automatically up-to-date. The PDB ftp site can also&lt;br /&gt;
be specified (see API documentation).&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; has some additional methods that can be of use. The&lt;br /&gt;
&amp;lt;code&amp;gt;get_all_obsolete&amp;lt;/code&amp;gt; method can be used to get a list of all&lt;br /&gt;
obsolete PDB entries. The &amp;lt;code&amp;gt;changed_this_week&amp;lt;/code&amp;gt; method can&lt;br /&gt;
be used to obtain the entries that were added, modified or obsoleted&lt;br /&gt;
during the current week. For more info on the possibilities of &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt;,&lt;br /&gt;
see the API documentation.&lt;br /&gt;
&lt;br /&gt;
==== What about all those buggy PDB files? ====&lt;br /&gt;
&lt;br /&gt;
It is well known that many PDB files contain semantic errors (I'm&lt;br /&gt;
not talking about the structures themselves know, but their representation&lt;br /&gt;
in PDB files). Bio.PDB tries to handle this in two ways. The PDBParser&lt;br /&gt;
object can behave in two ways: a restrictive way and a permissive&lt;br /&gt;
way (THIS IS NOW THE DEFAULT). The restrictive way used to be the&lt;br /&gt;
default, but people seemed to think that Bio.PDB 'crashed' due to&lt;br /&gt;
a bug (hah!), so I changed it. If you ever encounter a real bug, please&lt;br /&gt;
tell me immediately!&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Permissive parser&lt;br /&gt;
parser = PDBParser(PERMISSIVE=1)&lt;br /&gt;
parser = PDBParser()  # The same (default)&lt;br /&gt;
# Strict parser&lt;br /&gt;
strict_parser = PDBParser(PERMISSIVE=0)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
In the permissive state (DEFAULT), PDB files that obviously contain&lt;br /&gt;
errors are 'corrected' (i.e. some residues or atoms are left out).&lt;br /&gt;
These errors include:&lt;br /&gt;
* Multiple residues with the same identifier&lt;br /&gt;
* Multiple atoms with the same identifier (taking into account the altloc identifier)&lt;br /&gt;
These errors indicate real problems in the PDB file (for details see&lt;br /&gt;
the Bioinformatics article). In the restrictive state, PDB files with&lt;br /&gt;
errors cause an exception to occur. This is useful to find errors&lt;br /&gt;
in PDB files.&lt;br /&gt;
&lt;br /&gt;
Some errors however are automatically corrected. Normally each disordered&lt;br /&gt;
atom should have a non-blank altloc identifier. However, there are&lt;br /&gt;
many structures that do not follow this convention, and have a blank&lt;br /&gt;
and a non-blank identifier for two disordered positions of the same&lt;br /&gt;
atom. This is automatically interpreted in the right way.&lt;br /&gt;
&lt;br /&gt;
Sometimes a structure contains a list of residues belonging to chain&lt;br /&gt;
A, followed by residues belonging to chain B, and again followed by&lt;br /&gt;
residues belonging to chain A, i.e. the chains are 'broken'. This&lt;br /&gt;
is also correctly interpreted.&lt;br /&gt;
&lt;br /&gt;
==== Can I write PDB files? ====&lt;br /&gt;
&lt;br /&gt;
Use the PDBIO class for this. It's easy to write out specific parts&lt;br /&gt;
of a structure too, of course.&lt;br /&gt;
&lt;br /&gt;
Example: saving a structure&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
io = PDBIO()&lt;br /&gt;
io.set_structure(s)&lt;br /&gt;
io.save('out.pdb')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
If you want to write out a part of the structure, make use of the&lt;br /&gt;
&amp;lt;code&amp;gt;Select&amp;lt;/code&amp;gt; class (also in &amp;lt;code&amp;gt;PDBIO&amp;lt;/code&amp;gt;). Select has four methods:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
accept_model(model)&lt;br /&gt;
accept_chain(chain)&lt;br /&gt;
accept_residue(residue)&lt;br /&gt;
accept_atom(atom)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
By default, every method returns 1 (which means the model/chain/residue/atom&lt;br /&gt;
is included in the output). By subclassing &amp;lt;code&amp;gt;Select&amp;lt;/code&amp;gt; and returning&lt;br /&gt;
0 when appropriate you can exclude models, chains, etc. from the output.&lt;br /&gt;
Cumbersome maybe, but very powerful. The following code only writes&lt;br /&gt;
out glycine residues:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
class GlySelect(Select):&lt;br /&gt;
    def accept_residue(self, residue):&lt;br /&gt;
        if residue.get_name()=='GLY':&lt;br /&gt;
            return 1&lt;br /&gt;
        else:&lt;br /&gt;
            return 0&lt;br /&gt;
&lt;br /&gt;
io = PDBIO()&lt;br /&gt;
io.set_structure(s)&lt;br /&gt;
io.save('gly_only.pdb', GlySelect())&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
If this is all too complicated for you, the &amp;lt;code&amp;gt;Dice&amp;lt;/code&amp;gt; module contains&lt;br /&gt;
a handy &amp;lt;code&amp;gt;extract&amp;lt;/code&amp;gt; function that writes out all residues in&lt;br /&gt;
a chain between a start and end residue.&lt;br /&gt;
&lt;br /&gt;
==== Can I write mmCIF files? ====&lt;br /&gt;
&lt;br /&gt;
No, and I also don't have plans to add that functionality soon (or&lt;br /&gt;
ever - I don't need it at all, and it's a lot of work, plus no-one&lt;br /&gt;
has ever asked for it). People who want to add this can contact me.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== The Structure object ===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==== What's the overall layout of a Structure object? ====&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object follows the so-called '''SMCRA'''&lt;br /&gt;
(Structure/Model/Chain/Residue/Atom) architecture : &lt;br /&gt;
* A structure consists of models&lt;br /&gt;
* A model consists of chains&lt;br /&gt;
* A chain consists of residues&lt;br /&gt;
* A residue consists of atoms&lt;br /&gt;
This is the way many structural biologists/bioinformaticians think&lt;br /&gt;
about structure, and provides a simple but efficient way to deal with&lt;br /&gt;
structure. Additional stuff is essentially added when needed. A UML&lt;br /&gt;
diagram of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object (forgetting about the &amp;lt;code&amp;gt;Disordered&amp;lt;/code&amp;gt;&lt;br /&gt;
classes for now) is shown in the figure below.&lt;br /&gt;
&lt;br /&gt;
[[Image:Smcra.png|600px|left|frame|Diagram of SMCRA architecture of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object. Full lines with diamonds denote aggregation, full lines with arrows denote referencing, full lines with triangles denote inheritance and dashed lines with triangles denote interface realization.]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br clear=all&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I navigate through a Structure object? ====&lt;br /&gt;
&lt;br /&gt;
The following code iterates through all atoms of a structure:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
p = PDBParser()&lt;br /&gt;
structure = p.get_structure('X', 'pdb1fat.ent')&lt;br /&gt;
for model in structure:&lt;br /&gt;
    for chain in model:&lt;br /&gt;
        for residue in chain:&lt;br /&gt;
            for atom in residue:&lt;br /&gt;
                print atom&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
There are also some shortcuts:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Iterate over all atoms in a structure&lt;br /&gt;
for atom in structure.get_atoms():&lt;br /&gt;
    print atom&lt;br /&gt;
&lt;br /&gt;
# Iterate over all residues in a model&lt;br /&gt;
for residue in model.get_residues():&lt;br /&gt;
    print residue&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Structures, models, chains, residues and atoms are called '''Entities'''&lt;br /&gt;
in Biopython. You can always get a parent &amp;lt;code&amp;gt;Entity&amp;lt;/code&amp;gt; from a child&lt;br /&gt;
&amp;lt;code&amp;gt;Entity&amp;lt;/code&amp;gt;, e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
residue = atom.get_parent()&lt;br /&gt;
chain = residue.get_parent()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
You can also test whether an &amp;lt;code&amp;gt;Entity&amp;lt;/code&amp;gt; has a certain child using&lt;br /&gt;
the &amp;lt;code&amp;gt;has_id&amp;lt;/code&amp;gt; method.&lt;br /&gt;
&lt;br /&gt;
==== Can I do that a bit more conveniently? ====&lt;br /&gt;
&lt;br /&gt;
You can do things like:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
atoms = structure.get_atoms()&lt;br /&gt;
residues = structure.get_residues()&lt;br /&gt;
atoms = chain.get_atoms()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
You can also use the &amp;lt;code&amp;gt;Selection.unfold_entities&amp;lt;/code&amp;gt; function:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Get all residues from a structure&lt;br /&gt;
res_list = Selection.unfold_entities(structure, 'R')&lt;br /&gt;
# Get all atoms from a chain&lt;br /&gt;
atom_list = Selection.unfold_entities(chain, 'A')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Obviously, &amp;lt;code&amp;gt;A&amp;lt;/code&amp;gt;=atom, &amp;lt;code&amp;gt;R&amp;lt;/code&amp;gt;=residue, &amp;lt;code&amp;gt;C&amp;lt;/code&amp;gt;=chain, &amp;lt;code&amp;gt;M&amp;lt;/code&amp;gt;=model, &amp;lt;code&amp;gt;S&amp;lt;/code&amp;gt;=structure.&lt;br /&gt;
You can use this to go up in the hierarchy, e.g. to get a list of (unique) &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;Chain&amp;lt;/code&amp;gt; parents from a list of&lt;br /&gt;
&amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;s:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
residue_list = Selection.unfold_entities(atom_list, 'R')&lt;br /&gt;
chain_list = Selection.unfold_entities(atom_list, 'C')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
For more info, see the API documentation.&lt;br /&gt;
&lt;br /&gt;
==== How do I extract a specific &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;Chain&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;Model&amp;lt;/code&amp;gt; from a &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt;? ====&lt;br /&gt;
&lt;br /&gt;
Easy. Here are some examples:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
model = structure[0]&lt;br /&gt;
chain = model['A']&lt;br /&gt;
residue = chain[100]&lt;br /&gt;
atom = residue['CA']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Note that you can use a shortcut:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
atom = structure[0]['A'][100]['CA']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== What is a model id? ====&lt;br /&gt;
&lt;br /&gt;
The model id is an integer which denotes the rank of the model in&lt;br /&gt;
the PDB/mmCIF file. The model is starts at 0. Crystal structures generally&lt;br /&gt;
have only one model (with id 0), while NMR files usually have several&lt;br /&gt;
models.&lt;br /&gt;
&lt;br /&gt;
==== What is a chain id? ====&lt;br /&gt;
&lt;br /&gt;
The chain id is specified in the PDB/mmCIF file, and is a single character&lt;br /&gt;
(typically a letter). &lt;br /&gt;
&lt;br /&gt;
==== What is a residue id? ====&lt;br /&gt;
&lt;br /&gt;
This is a bit more complicated, due to the clumsy PDB format. A residue&lt;br /&gt;
id is a tuple with three elements:&lt;br /&gt;
* The '''hetero-flag''': this is &amp;lt;code&amp;gt;'H_'&amp;lt;/code&amp;gt; plus the name of the hetero-residue (e.g. &amp;lt;code&amp;gt;'H_GLC'&amp;lt;/code&amp;gt; in the case of a glucose molecule), or &amp;lt;code&amp;gt;'W'&amp;lt;/code&amp;gt; in the case of a water molecule.&lt;br /&gt;
* The '''sequence identifier''' in the chain, e.g. 100&lt;br /&gt;
* The '''insertion code''', e.g. &amp;lt;code&amp;gt;'A'&amp;lt;/code&amp;gt;. The insertion code is sometimes used to preserve a certain desirable residue numbering scheme. A Ser 80 insertion mutant (inserted e.g. between a Thr 80 and an Asn 81 residue) could e.g. have sequence identifiers and insertion codes as follows: Thr 80 A, Ser 80 B, Asn 81. In this way the residue numbering scheme stays in tune with that of the wild type structure.&lt;br /&gt;
&lt;br /&gt;
The id of the above glucose residue would thus be &amp;lt;code&amp;gt;('H_GLC', 100, 'A')&amp;lt;/code&amp;gt;. If the hetero-flag and insertion code are blank, the sequence&lt;br /&gt;
identifier alone can be used:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Full id&lt;br /&gt;
residue = chain[(' ', 100, ' ')]&lt;br /&gt;
# Shortcut id&lt;br /&gt;
residue = chain[100]&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
The reason for the hetero-flag is that many, many PDB files use the same sequence identifier for an amino acid and a hetero-residue or a water, which would create obvious problems if the hetero-flag was not used.&lt;br /&gt;
&lt;br /&gt;
==== What is an atom id? ====&lt;br /&gt;
&lt;br /&gt;
The atom id is simply the atom name (eg. &amp;lt;code&amp;gt;'CA'&amp;lt;/code&amp;gt;). In practice,&lt;br /&gt;
the atom name is created by stripping all spaces from the atom name&lt;br /&gt;
in the PDB file. &lt;br /&gt;
&lt;br /&gt;
However, in PDB files, a space can be part of an atom name. Often,&lt;br /&gt;
calcium atoms are called &amp;lt;code&amp;gt;'CA..'&amp;lt;/code&amp;gt; in order to distinguish them&lt;br /&gt;
from C&amp;amp;alpha; atoms (which are called &amp;lt;code&amp;gt;'.CA.'&amp;lt;/code&amp;gt;). In cases&lt;br /&gt;
were stripping the spaces would create problems (ie. two atoms called&lt;br /&gt;
&amp;lt;code&amp;gt;'CA'&amp;lt;/code&amp;gt; in the same residue) the spaces are kept.&lt;br /&gt;
&lt;br /&gt;
==== How is disorder handled? ====&lt;br /&gt;
&lt;br /&gt;
This is one of the strong points of Bio.PDB. It can handle both disordered&lt;br /&gt;
atoms and point mutations (ie. a Gly and an Ala residue in the same&lt;br /&gt;
position). &lt;br /&gt;
&lt;br /&gt;
Disorder should be dealt with from two points of view: the atom and&lt;br /&gt;
the residue points of view. In general, I have tried to encapsulate&lt;br /&gt;
all the complexity that arises from disorder. If you just want to&lt;br /&gt;
loop over all C&amp;amp;alpha; atoms, you do not care that some residues&lt;br /&gt;
have a disordered side chain. On the other hand it should also be&lt;br /&gt;
possible to represent disorder completely in the data structure. Therefore,&lt;br /&gt;
disordered atoms or residues are stored in special objects that behave&lt;br /&gt;
as if there is no disorder. This is done by only representing a subset&lt;br /&gt;
of the disordered atoms or residues. Which subset is picked (e.g.&lt;br /&gt;
which of the two disordered OG side chain atom positions of a Ser&lt;br /&gt;
residue is used) can be specified by the user.&lt;br /&gt;
&lt;br /&gt;
'''Disordered atom positions''' are represented by ordinary &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;&lt;br /&gt;
objects, but all &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects that represent the same physical&lt;br /&gt;
atom are stored in a &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; object (see section [[#The Structure object]]).&lt;br /&gt;
Each &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object in a &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; object can&lt;br /&gt;
be uniquely indexed using its altloc specifier. The &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt;&lt;br /&gt;
object forwards all uncaught method calls to the selected Atom object,&lt;br /&gt;
by default the one that represents the atom with the highest&lt;br /&gt;
occupancy. The user can of course change the selected &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;&lt;br /&gt;
object, making use of its altloc specifier. In this way atom disorder&lt;br /&gt;
is represented correctly without much additional complexity. In other&lt;br /&gt;
words, if you are not interested in atom disorder, you will not be&lt;br /&gt;
bothered by it.&lt;br /&gt;
&lt;br /&gt;
Each disordered atom has a characteristic altloc identifier. You can&lt;br /&gt;
specify that a &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; object should behave like&lt;br /&gt;
the &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object associated with a specific altloc identifier:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
atom.disordered_select('A')  # select altloc A atom&lt;br /&gt;
atom.disordered_select('B')  # select altloc B atom &lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
A special case arises when disorder is due to '''point mutations''',&lt;br /&gt;
i.e. when two or more point mutants of a polypeptide are present in&lt;br /&gt;
the crystal. An example of this can be found in PDB structure 1EN2.&lt;br /&gt;
&lt;br /&gt;
Since these residues belong to a different residue type (e.g. let's&lt;br /&gt;
say Ser 60 and Cys 60) they should not be stored in a single &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt;&lt;br /&gt;
object as in the common case. In this case, each residue is represented&lt;br /&gt;
by one &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object, and both &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; objects&lt;br /&gt;
are stored in a single &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt; object (see section [[#The Structure object]]).&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt; object forwards all uncaught&lt;br /&gt;
methods to the selected &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object (by default the last&lt;br /&gt;
&amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object added), and thus behaves like an ordinary&lt;br /&gt;
residue. Each &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object in a &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt;&lt;br /&gt;
object can be uniquely identified by its residue name. In the above&lt;br /&gt;
example, residue Ser 60 would have id 'SER' in the &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt;&lt;br /&gt;
object, while residue Cys 60 would have id 'CYS'. The user can select&lt;br /&gt;
the active &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object in a &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt;&lt;br /&gt;
object via this id.&lt;br /&gt;
&lt;br /&gt;
Example: suppose that a chain has a point mutation at position 10,&lt;br /&gt;
consisting of a Ser and a Cys residue. Make sure that residue 10 of&lt;br /&gt;
this chain behaves as the Cys residue.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
residue = chain[10]&lt;br /&gt;
residue.disordered_select('CYS')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
In addition, you can get a list of all &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects (ie.&lt;br /&gt;
all &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; objects are 'unpacked' to their individual&lt;br /&gt;
&amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects) using the &amp;lt;code&amp;gt;get_unpacked_list&amp;lt;/code&amp;gt; method&lt;br /&gt;
of a (&amp;lt;code&amp;gt;Disordered&amp;lt;/code&amp;gt;)&amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object.&lt;br /&gt;
&lt;br /&gt;
==== Can I sort residues in a chain somehow? ====&lt;br /&gt;
&lt;br /&gt;
Yes, kinda, but I'm waiting for a request for this feature to finish&lt;br /&gt;
it :-).&lt;br /&gt;
&lt;br /&gt;
==== How are ligands and solvent handled? ====&lt;br /&gt;
&lt;br /&gt;
See 'What is a residue id?'.&lt;br /&gt;
&lt;br /&gt;
==== What about B factors? ====&lt;br /&gt;
&lt;br /&gt;
Well, yes! Bio.PDB supports isotropic and anisotropic B factors, and&lt;br /&gt;
also deals with standard deviations of anisotropic B factor if present&lt;br /&gt;
(see the section [[#Analysis]]).&lt;br /&gt;
&lt;br /&gt;
==== What about standard deviation of atomic positions? ====&lt;br /&gt;
&lt;br /&gt;
Yup, supported. See the section [[#Analysis]].&lt;br /&gt;
&lt;br /&gt;
==== I think the SMCRA data structure is not flexible/sexy/whatever enough... ====&lt;br /&gt;
&lt;br /&gt;
Sure, sure. Everybody is always coming up with (mostly vaporware or&lt;br /&gt;
partly implemented) data structures that handle all possible situations&lt;br /&gt;
and are extensible in all thinkable (and unthinkable) ways. The prosaic&lt;br /&gt;
truth however is that 99.9\% of people using (and I mean really using!)&lt;br /&gt;
crystal structures think in terms of models, chains, residues and&lt;br /&gt;
atoms. The philosophy of Bio.PDB is to provide a reasonably fast,&lt;br /&gt;
clean, simple, but complete data structure to access structure data.&lt;br /&gt;
The proof of the pudding is in the eating.&lt;br /&gt;
&lt;br /&gt;
Moreover, it is quite easy to build more specialised data structures&lt;br /&gt;
on top of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; class (eg. there's a &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt;&lt;br /&gt;
class). On the other hand, the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object is built&lt;br /&gt;
using a Parser/Consumer approach (called &amp;lt;code&amp;gt;PDBParser&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;MMCIFParser&amp;lt;/code&amp;gt;&lt;br /&gt;
and &amp;lt;code&amp;gt;StructureBuilder&amp;lt;/code&amp;gt;, respectively). One can easily reuse&lt;br /&gt;
the PDB/mmCIF parsers by implementing a specialised &amp;lt;code&amp;gt;StructureBuilder&amp;lt;/code&amp;gt;&lt;br /&gt;
class. It is of course also trivial to add support for new file formats&lt;br /&gt;
by writing new parsers.&lt;br /&gt;
&lt;br /&gt;
=== Analysis ===&lt;br /&gt;
&lt;br /&gt;
==== How do I extract information from an &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object? ====&lt;br /&gt;
&lt;br /&gt;
Using the following methods:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
a.get_name()           # atom name (spaces stripped, e.g. 'CA')&lt;br /&gt;
a.get_id()             # id (equals atom name)&lt;br /&gt;
a.get_coord()          # atomic coordinates&lt;br /&gt;
a.get_vector()         # atomic coordinates as Vector object&lt;br /&gt;
a.get_bfactor()        # isotropic B factor&lt;br /&gt;
a.get_occupancy()      # occupancy&lt;br /&gt;
a.get_altloc()         # alternative location specifier&lt;br /&gt;
a.get_sigatm()         # std. dev. of atomic parameters&lt;br /&gt;
a.get_siguij()         # std. dev. of anisotropic B factor&lt;br /&gt;
a.get_anisou()         # anisotropic B factor&lt;br /&gt;
a.get_fullname()       # atom name (with spaces, e.g. '.CA.')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I extract information from a &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object? ====&lt;br /&gt;
&lt;br /&gt;
Using the following methods:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
r.get_resname()         # return the residue name (eg. 'GLY')&lt;br /&gt;
r.is_disordered()       # 1 if the residue has disordered atoms&lt;br /&gt;
r.get_segid()           # return the SEGID&lt;br /&gt;
r.has_id(name)          # test if a residue has a certain atom&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I measure distances? ====&lt;br /&gt;
&lt;br /&gt;
That's simple: the minus operator for atoms has been overloaded to&lt;br /&gt;
return the distance between two atoms. &lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Get some atoms&lt;br /&gt;
ca1 = residue1['CA']&lt;br /&gt;
ca2 = residue2['CA']&lt;br /&gt;
# Simply subtract the atoms to get their distance&lt;br /&gt;
distance = ca1-ca2&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I measure angles? ====&lt;br /&gt;
&lt;br /&gt;
This can easily be done via the vector representation of the atomic coordinates, and the &amp;lt;code&amp;gt;calc_angle&amp;lt;/code&amp;gt; function from the &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; module:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
vector1 = atom1.get_vector()&lt;br /&gt;
vector2 = atom2.get_vector()&lt;br /&gt;
vector3 = atom3.get_vector()&lt;br /&gt;
angle = calc_angle(vector1, vector2, vector3)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I measure torsion angles? ====&lt;br /&gt;
&lt;br /&gt;
Again, this can easily be done via the vector representation of the&lt;br /&gt;
atomic coordinates, this time using the &amp;lt;code&amp;gt;calc_dihedral&amp;lt;/code&amp;gt; function&lt;br /&gt;
from the &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; module:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
vector1 = atom1.get_vector()&lt;br /&gt;
vector2 = atom2.get_vector()&lt;br /&gt;
vector3 = atom3.get_vector()&lt;br /&gt;
vector4 = atom4.get_vector()&lt;br /&gt;
angle = calc_dihedral(vector1, vector2, vector3, vector4)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I determine atom-atom contacts? ====&lt;br /&gt;
&lt;br /&gt;
Use &amp;lt;code&amp;gt;NeighborSearch&amp;lt;/code&amp;gt;. This uses a KD tree data structure coded&lt;br /&gt;
in C behind the screens, so it's pretty darn fast (see &amp;lt;code&amp;gt;Bio.KDTree&amp;lt;/code&amp;gt;).&lt;br /&gt;
&lt;br /&gt;
==== How do I extract polypeptides from a &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object? ====&lt;br /&gt;
&lt;br /&gt;
Use &amp;lt;code&amp;gt;PolypeptideBuilder&amp;lt;/code&amp;gt;. You can use the resulting &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt;&lt;br /&gt;
object to get the sequence as a &amp;lt;code&amp;gt;Seq&amp;lt;/code&amp;gt; object or to get a list&lt;br /&gt;
of C&amp;amp;alpha; atoms as well. Polypeptides can be built using a C-N&lt;br /&gt;
or a C&amp;amp;alpha;-C&amp;amp;alpha; distance criterion.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Using C-N &lt;br /&gt;
ppb=PPBuilder()&lt;br /&gt;
for pp in ppb.build_peptides(structure): &lt;br /&gt;
    print pp.get_sequence()&lt;br /&gt;
&lt;br /&gt;
# Using CA-CA&lt;br /&gt;
ppb=CaPPBuilder()&lt;br /&gt;
for pp in ppb.build_peptides(structure): &lt;br /&gt;
    print pp.get_sequence()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that in the above case only model 0 of the structure is considered&lt;br /&gt;
by &amp;lt;code&amp;gt;PolypeptideBuilder&amp;lt;/code&amp;gt;. However, it is possible to use &amp;lt;code&amp;gt;PolypeptideBuilder&amp;lt;/code&amp;gt;&lt;br /&gt;
to build &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt; objects from &amp;lt;code&amp;gt;Model&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;Chain&amp;lt;/code&amp;gt;&lt;br /&gt;
objects as well.&lt;br /&gt;
&lt;br /&gt;
==== How do I get the sequence of a structure? ====&lt;br /&gt;
&lt;br /&gt;
The first thing to do is to extract all polypeptides from the structure&lt;br /&gt;
(see previous entry). The sequence of each polypeptide can then easily&lt;br /&gt;
be obtained from the &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt; objects. The sequence is&lt;br /&gt;
represented as a Biopython &amp;lt;code&amp;gt;Seq&amp;lt;/code&amp;gt; object, and its alphabet is&lt;br /&gt;
defined by a &amp;lt;code&amp;gt;ProteinAlphabet&amp;lt;/code&amp;gt; object.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; seq = polypeptide.get_sequence()&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print seq&lt;br /&gt;
Seq('SNVVE...', &amp;lt;class Bio.Alphabet.ProteinAlphabet&amp;gt;)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I determine secondary structure? ====&lt;br /&gt;
&lt;br /&gt;
For this functionality, you need to install DSSP (and obtain a license&lt;br /&gt;
for it - free for academic use, see http://www.cmbi.kun.nl/gv/dssp/).&lt;br /&gt;
Then use the &amp;lt;code&amp;gt;DSSP&amp;lt;/code&amp;gt; class, which maps &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; objects&lt;br /&gt;
to their secondary structure (and accessible surface area). The DSSP&lt;br /&gt;
codes are listed in Table \ref{cap:DSSP-codes. Note that DSSP (the&lt;br /&gt;
program, and thus by consequence the class) cannot handle multiple&lt;br /&gt;
models!&lt;br /&gt;
&lt;br /&gt;
%&lt;br /&gt;
\begin{table&lt;br /&gt;
&lt;br /&gt;
==== \begin{tabular{|c|c|&lt;br /&gt;
\hline &lt;br /&gt;
Code&amp;amp;&lt;br /&gt;
Secondary structure\tabularnewline&lt;br /&gt;
\hline&lt;br /&gt;
\hline &lt;br /&gt;
H&amp;amp;&lt;br /&gt;
&amp;amp;alpha;-helix\tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
B&amp;amp;&lt;br /&gt;
Isolated &amp;amp;beta;-bridge residue\tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
E&amp;amp;&lt;br /&gt;
Strand \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
G&amp;amp;&lt;br /&gt;
3-10 helix \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
I&amp;amp;&lt;br /&gt;
$\Pi$-helix \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
T&amp;amp;&lt;br /&gt;
Turn\tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
S&amp;amp;&lt;br /&gt;
Bend \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
-&amp;amp;&lt;br /&gt;
Other\tabularnewline&lt;br /&gt;
\hline&lt;br /&gt;
\end{tabular&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
\caption{\label{cap:DSSP-codesDSSP codes in Bio.PDB.&lt;br /&gt;
\end{table&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==== How do I calculate the accessible surface area of a residue? ====&lt;br /&gt;
&lt;br /&gt;
Use the &amp;lt;code&amp;gt;DSSP&amp;lt;/code&amp;gt; class (see also previous entry). But see also&lt;br /&gt;
next entry.&lt;br /&gt;
&lt;br /&gt;
==== How do I calculate residue depth? ====&lt;br /&gt;
&lt;br /&gt;
Residue depth is the average distance of a residue's atoms from the&lt;br /&gt;
solvent accessible surface. It's a fairly new and very powerful parameterization&lt;br /&gt;
of solvent accessibility. For this functionality, you need to install&lt;br /&gt;
Michel Sanner's MSMS program (http://www.scripps.edu/pub/olson-web/people/sanner/html/msms_home.html).&lt;br /&gt;
Then use the &amp;lt;code&amp;gt;ResidueDepth&amp;lt;/code&amp;gt; class. This class behaves as a&lt;br /&gt;
dictionary which maps &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; objects to corresponding (residue&lt;br /&gt;
depth, C&amp;amp;alpha; depth) tuples. The C&amp;amp;alpha; depth is the distance&lt;br /&gt;
of a residue's C&amp;amp;alpha; atom to the solvent accessible surface. &lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
model = structure[0]&lt;br /&gt;
rd = ResidueDepth(model, pdb_file)&lt;br /&gt;
residue_depth, ca_depth = rd[some_residue]&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
You can also get access to the molecular surface itself (via the &amp;lt;code&amp;gt;get_surface&amp;lt;/code&amp;gt;&lt;br /&gt;
function), in the form of a Numeric python array with the surface points.&lt;br /&gt;
&lt;br /&gt;
==== How do I calculate Half Sphere Exposure? ====&lt;br /&gt;
&lt;br /&gt;
Half Sphere Exposure (HSE) is a new, 2D measure of solvent exposure.&lt;br /&gt;
Basically, it counts the number of C&amp;amp;alpha; atoms around a residue&lt;br /&gt;
in the direction of its side chain, and in the opposite direction&lt;br /&gt;
(within a radius of 13 Å). Despite its simplicity, it outperforms&lt;br /&gt;
many other measures of solvent exposure. An article describing this&lt;br /&gt;
novel 2D measure has been submitted.&lt;br /&gt;
&lt;br /&gt;
HSE comes in two flavors: HSE&amp;amp;alpha; and HSE&amp;amp;beta;. The former&lt;br /&gt;
only uses the C&amp;amp;alpha; atom positions, while the latter uses the&lt;br /&gt;
C&amp;amp;alpha; and C&amp;amp;beta; atom positions. The HSE measure is calculated&lt;br /&gt;
by the &amp;lt;code&amp;gt;HSExposure&amp;lt;/code&amp;gt; class, which can also calculate the contact&lt;br /&gt;
number. The latter class has methods which return dictionaries that&lt;br /&gt;
map a &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object to its corresponding HSE&amp;amp;alpha;, HSE&amp;amp;beta;&lt;br /&gt;
and contact number values.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
model = structure[0]&lt;br /&gt;
hse = HSExposure()&lt;br /&gt;
# Calculate HSEalpha&lt;br /&gt;
exp_ca = hse.calc_hs_exposure(model, option='CA3')&lt;br /&gt;
# Calculate HSEbeta&lt;br /&gt;
exp_cb = hse.calc_hs_exposure(model, option='CB')&lt;br /&gt;
# Calculate classical coordination number&lt;br /&gt;
exp_fs = hse.calc_fs_exposure(model)&lt;br /&gt;
# Print HSEalpha for a residue&lt;br /&gt;
print exp_ca[some_residue]&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I map the residues of two related structures onto each other? ====&lt;br /&gt;
&lt;br /&gt;
First, create an alignment file in FASTA format, then use the &amp;lt;code&amp;gt;StructureAlignment&amp;lt;/code&amp;gt;&lt;br /&gt;
class. This class can also be used for alignments with more than two&lt;br /&gt;
structures.&lt;br /&gt;
&lt;br /&gt;
==== How do I test if a Residue object is an amino acid? ====&lt;br /&gt;
&lt;br /&gt;
Use &amp;lt;code&amp;gt;is_aa(residue)&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
==== Can I do vector operations on atomic coordinates? ====&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects return a &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; object representation&lt;br /&gt;
of the coordinates with the &amp;lt;code&amp;gt;get_vector&amp;lt;/code&amp;gt; method. &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt;&lt;br /&gt;
implements the full set of 3D vector operations, matrix multiplication&lt;br /&gt;
(left and right) and some advanced rotation-related operations as&lt;br /&gt;
well. See also next question.&lt;br /&gt;
&lt;br /&gt;
==== How do I put a virtual C&amp;amp;beta; on a Gly residue? ====&lt;br /&gt;
&lt;br /&gt;
OK, I admit, this example is only present to show off the possibilities&lt;br /&gt;
of Bio.PDB's &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; module (though this code is actually&lt;br /&gt;
used in the &amp;lt;code&amp;gt;HSExposure&amp;lt;/code&amp;gt; module, which contains a novel way&lt;br /&gt;
to parametrize residue exposure - publication underway). Suppose that&lt;br /&gt;
you would like to find the position of a Gly residue's C&amp;amp;beta; atom,&lt;br /&gt;
if it had one. How would you do that? Well, rotating the N atom of&lt;br /&gt;
the Gly residue along the C&amp;amp;alpha;-C bond over -120 degrees roughly&lt;br /&gt;
puts it in the position of a virtual C&amp;amp;beta; atom. Here's how to&lt;br /&gt;
do it, making use of the &amp;lt;code&amp;gt;rotaxis&amp;lt;/code&amp;gt; method (which can be used&lt;br /&gt;
to construct a rotation around a certain axis) of the &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt;&lt;br /&gt;
module:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# get atom coordinates as vectors&lt;br /&gt;
n = residue['N'].get_vector() &lt;br /&gt;
c = residue['C'].get_vector() &lt;br /&gt;
ca = residue['CA'].get_vector()&lt;br /&gt;
# center at origin&lt;br /&gt;
n = n - ca&lt;br /&gt;
c = c - ca&lt;br /&gt;
# find rotation matrix that rotates n -120 degrees along the ca-c vector&lt;br /&gt;
rot = rotaxis(-pi{*120.0/180.0, c)&lt;br /&gt;
# apply rotation to ca-n vector&lt;br /&gt;
cb_at_origin = n.left_multiply(rot)&lt;br /&gt;
# put on top of ca atom&lt;br /&gt;
cb = cb_at_origin + ca&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
This example shows that it's possible to do some quite nontrivial&lt;br /&gt;
vector operations on atomic data, which can be quite useful. In addition&lt;br /&gt;
to all the usual vector operations (cross (use &amp;lt;code&amp;gt;**&amp;lt;/code&amp;gt;), and&lt;br /&gt;
dot (use &amp;lt;code&amp;gt;*&amp;lt;/code&amp;gt;) product, angle, norm, etc.) and the above mentioned&lt;br /&gt;
&amp;lt;code&amp;gt;rotaxis function, the &amp;lt;code&amp;gt;Vector module also has methods&lt;br /&gt;
to rotate (&amp;lt;code&amp;gt;rotmat&amp;lt;/code&amp;gt;) or reflect (&amp;lt;code&amp;gt;refmat&amp;lt;/code&amp;gt;) one vector&lt;br /&gt;
on top of another.&lt;br /&gt;
&lt;br /&gt;
=== Manipulating the structure ===&lt;br /&gt;
&lt;br /&gt;
==== How do I superimpose two structures? ====&lt;br /&gt;
&lt;br /&gt;
Surprisingly, this is done using the &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; object.&lt;br /&gt;
This object calculates the rotation and translation matrix that rotates&lt;br /&gt;
two lists of atoms on top of each other in such a way that their RMSD&lt;br /&gt;
is minimized. Of course, the two lists need to contain the same amount&lt;br /&gt;
of atoms. The &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; object can also apply the rotation/translation&lt;br /&gt;
to a list of atoms. The rotation and translation are stored as a tuple&lt;br /&gt;
in the &amp;lt;code&amp;gt;rotran&amp;lt;/code&amp;gt; attribute of the &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; object&lt;br /&gt;
(note that the rotation is right multiplying!). The RMSD is stored&lt;br /&gt;
in the &amp;lt;code&amp;gt;rmsd&amp;lt;/code&amp;gt; attribute.&lt;br /&gt;
&lt;br /&gt;
The algorithm used by &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; comes from \textit{Matrix&lt;br /&gt;
computations, 2nd ed. Golub, G. \&amp;amp; Van Loan (1989) and makes use&lt;br /&gt;
of singular value decomposition (this is implemented in the general&lt;br /&gt;
&amp;lt;code&amp;gt;Bio.SVDSuperimposer&amp;lt;/code&amp;gt; module).&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
sup = Superimposer()&lt;br /&gt;
# Specify the atom lists&lt;br /&gt;
# 'fixed' and 'moving' are lists of Atom objects&lt;br /&gt;
# The moving atoms will be put on the fixed atoms&lt;br /&gt;
sup.set_atoms(fixed, moving)&lt;br /&gt;
# Print rotation/translation/rmsd&lt;br /&gt;
print sup.rotran&lt;br /&gt;
print sup.rms &lt;br /&gt;
# Apply rotation/translation to the moving atoms&lt;br /&gt;
sup.apply(moving)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I superimpose two structures based on their active sites? ====&lt;br /&gt;
&lt;br /&gt;
Pretty easily. Use the active site atoms to calculate the rotation/translation&lt;br /&gt;
matrices (see above), and apply these to the whole molecule.&lt;br /&gt;
&lt;br /&gt;
==== Can I manipulate the atomic coordinates? ====&lt;br /&gt;
&lt;br /&gt;
Yes, using the &amp;lt;code&amp;gt;transform&amp;lt;/code&amp;gt; method of the &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object,&lt;br /&gt;
or directly using the &amp;lt;code&amp;gt;set_coord&amp;lt;/code&amp;gt; method.&lt;br /&gt;
&lt;br /&gt;
== Other Structural Bioinformatics modules ==&lt;br /&gt;
&lt;br /&gt;
==== Bio.SCOP ====&lt;br /&gt;
&lt;br /&gt;
See the main Biopython tutorial.&lt;br /&gt;
&lt;br /&gt;
==== Bio.FSSP ====&lt;br /&gt;
&lt;br /&gt;
No documentation available yet.&lt;br /&gt;
&lt;br /&gt;
== You haven't answered my question yet! ==&lt;br /&gt;
&lt;br /&gt;
Woah! It's late and I'm tired, and a glass of excellent ''Pedro Ximenez'' sherry is waiting for me. Just drop me a mail, and I'll answer&lt;br /&gt;
you in the morning (with a bit of luck...).&lt;br /&gt;
&lt;br /&gt;
== Contributors ==&lt;br /&gt;
&lt;br /&gt;
The main author/maintainer of Bio.PDB is:&lt;br /&gt;
&lt;br /&gt;
 Thomas Hamelryck&lt;br /&gt;
 Bioinformatics center&lt;br /&gt;
 Institute of Molecular Biology&lt;br /&gt;
 University of Copenhagen&lt;br /&gt;
 Universitetsparken 15, Bygning 10&lt;br /&gt;
 DK-2100 København Ø&lt;br /&gt;
 Denmark&lt;br /&gt;
 thamelry@binf.ku.dk&lt;br /&gt;
&lt;br /&gt;
Kristian Rother donated code to interact with the PDB database, and to parse the PDB&lt;br /&gt;
header. Indraneel Majumdar sent in some bug reports and assisted in&lt;br /&gt;
coding the &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt; module. Many thanks to Brad Chapman,&lt;br /&gt;
Jeffrey Chang, Andrew Dalke and Iddo Friedberg for suggestions, comments,&lt;br /&gt;
help and/or biting criticism :-).&lt;br /&gt;
&lt;br /&gt;
== Can I contribute? ==&lt;br /&gt;
&lt;br /&gt;
Yes, yes, yes! Just send an e-mail to [mailto:thamelry@binf.ku.dk Thomas Hamelryck] or to [mailto:biopython-dev@biopython.org the Biopython developers] if you have something useful to contribute! Eternal fame awaits!&lt;/div&gt;</summary>
		<author><name>Mdehoon</name></author>	</entry>

	<entry>
		<id>http://www.biopython.org/wiki/The_Biopython_Structural_Bioinformatics_FAQ</id>
		<title>The Biopython Structural Bioinformatics FAQ</title>
		<link rel="alternate" type="text/html" href="http://www.biopython.org/wiki/The_Biopython_Structural_Bioinformatics_FAQ"/>
				<updated>2013-03-26T08:37:44Z</updated>
		
		<summary type="html">&lt;p&gt;Mdehoon: /* Who's using Bio.PDB? */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Introduction ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB is a biopython module that focuses on working with crystal&lt;br /&gt;
structures of biological macromolecules. This document gives a fairly&lt;br /&gt;
complete overview of Bio.PDB.&lt;br /&gt;
&lt;br /&gt;
== Bio.PDB's installation ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB is automatically installed as part of Biopython. Biopython&lt;br /&gt;
can be obtained from http://www.biopython.org. It runs on many&lt;br /&gt;
platforms (Linux/Unix, windows, Mac,...).&lt;br /&gt;
&lt;br /&gt;
== Who's using Bio.PDB? ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB was used in the construction of DISEMBL, a web server that&lt;br /&gt;
predicts disordered regions in proteins (http://dis.embl.de/),&lt;br /&gt;
and COLUMBA, a website that provides annotated protein structures&lt;br /&gt;
(http://www.columba-db.de/). Bio.PDB has also been used to&lt;br /&gt;
perform a large scale search for active sites similarities between&lt;br /&gt;
protein structures in the PDB (see [http://dx.doi.org/10.1002/prot.10338 ''Proteins'' '''51''': 96&amp;amp;ndash;108 (2003)),&lt;br /&gt;
and to develop a new algorithm that identifies linear secondary structure elements ([''BMC Bioinformatics'' '''6''': 202 (2005) http://www.biomedcentral.com/1471-2105/6/202]).&lt;br /&gt;
&lt;br /&gt;
Judging from requests for features and information, Bio.PDB is also&lt;br /&gt;
used by several LPCs (Large Pharmaceutical Companies :-).&lt;br /&gt;
&lt;br /&gt;
== Is there a Bio.PDB reference? ==&lt;br /&gt;
&lt;br /&gt;
Yes, and I'd appreciate it if you would refer to Bio.PDB in publications&lt;br /&gt;
if you make use of it. The reference is:&lt;br /&gt;
&lt;br /&gt;
\begin{quote&lt;br /&gt;
Hamelryck, T., Manderick, B. (2003) PDB parser and structure class&lt;br /&gt;
implemented in Python. \textit{Bioinformatics, \textbf{19, 2308-2310. &lt;br /&gt;
\end{quote&lt;br /&gt;
The article can be freely downloaded via the Bioinformatics journal&lt;br /&gt;
website (http://www.binf.ku.dk/users/thamelry/references.html).&lt;br /&gt;
I welcome e-mails telling me what you are using Bio.PDB for. Feature&lt;br /&gt;
requests are welcome too.&lt;br /&gt;
&lt;br /&gt;
== How well tested is Bio.PDB? ==&lt;br /&gt;
&lt;br /&gt;
Pretty well, actually. Bio.PDB has been extensively tested on nearly&lt;br /&gt;
5500 structures from the PDB - all structures seemed to be parsed&lt;br /&gt;
correctly. More details can be found in the Bio.PDB Bioinformatics&lt;br /&gt;
article. Bio.PDB has been used/is being used in many research projects&lt;br /&gt;
as a reliable tool. In fact, I'm using Bio.PDB almost daily for research&lt;br /&gt;
purposes and continue working on improving it and adding new features.&lt;br /&gt;
&lt;br /&gt;
== How fast is it? ==&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;PDBParser&amp;lt;/code&amp;gt; performance was tested on about 800 structures&lt;br /&gt;
(each belonging to a unique SCOP superfamily). This takes about 20&lt;br /&gt;
minutes, or on average 1.5 seconds per structure. Parsing the structure&lt;br /&gt;
of the large ribosomal subunit (1FKK), which contains about 64000&lt;br /&gt;
atoms, takes 10 seconds on a 1000 MHz PC. In short: it's more than&lt;br /&gt;
fast enough for many applications.&lt;br /&gt;
&lt;br /&gt;
== Why should I use Bio.PDB? ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB might be exactly what you want, and then again it might not.&lt;br /&gt;
If you are interested in data mining the PDB header, you might want&lt;br /&gt;
to look elsewhere because there is only limited support for this.&lt;br /&gt;
If you look for a powerful, complete data structure to access the&lt;br /&gt;
atomic data Bio.PDB is probably for you. &lt;br /&gt;
&lt;br /&gt;
== Usage ==&lt;br /&gt;
&lt;br /&gt;
=== General questions ===&lt;br /&gt;
&lt;br /&gt;
==== Importing Bio.PDB ====&lt;br /&gt;
&lt;br /&gt;
That's simple:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
from Bio.PDB import *&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Is there support for molecular graphics? ====&lt;br /&gt;
&lt;br /&gt;
Not directly, mostly since there are quite a few Python based/Python&lt;br /&gt;
aware solutions already, that can potentially be used with Bio.PDB.&lt;br /&gt;
My choice is Pymol, BTW (I've used this successfully with Bio.PDB,&lt;br /&gt;
and there will probably be specific PyMol modules in Bio.PDB soon/some&lt;br /&gt;
day). Python based/aware molecular graphics solutions include:&lt;br /&gt;
&lt;br /&gt;
* PyMol: http://pymol.sourceforge.net/&lt;br /&gt;
* Chimera: http://www.cgl.ucsf.edu/chimera/&lt;br /&gt;
* PMV: http://www.scripps.edu/ sanner/python/&lt;br /&gt;
* Coot: http://www.ysbl.york.ac.uk/ emsley/coot/&lt;br /&gt;
* CCP4mg: http://www.ysbl.york.ac.uk/ lizp/molgraphics.html&lt;br /&gt;
* mmLib: http://pymmlib.sourceforge.net/ &lt;br /&gt;
* VMD: http://www.ks.uiuc.edu/Research/vmd/&lt;br /&gt;
* MMTK: http://starship.python.net/crew/hinsen/MMTK/&lt;br /&gt;
&lt;br /&gt;
I'd be crazy to write another molecular graphics application (been&lt;br /&gt;
there - done that, actually :-).&lt;br /&gt;
&lt;br /&gt;
=== Input/output ===&lt;br /&gt;
&lt;br /&gt;
==== How do I create a structure object from a PDB file? ====&lt;br /&gt;
&lt;br /&gt;
First, create a &amp;lt;code&amp;gt;PDBParser&amp;lt;/code&amp;gt; object:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
parser = PDBParser()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then, create a structure object from a PDB file in the following way&lt;br /&gt;
(the PDB file in this case is called '1FAT.pdb', 'PHA-L' is a user&lt;br /&gt;
defined name for the structure):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
structure = parser.get_structure('PHA-L', '1FAT.pdb')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I create a structure object from an mmCIF file? ====&lt;br /&gt;
&lt;br /&gt;
Similarly to the case the case of PDB files, first create an &amp;lt;code&amp;gt;MMCIFParser&amp;lt;/code&amp;gt;&lt;br /&gt;
object:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
parser = MMCIFParser()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Then use this parser to create a structure object from the mmCIF file:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
structure = parser.get_structure('PHA-L', '1FAT.cif')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== ...and what about the new PDB XML format? ====&lt;br /&gt;
&lt;br /&gt;
That's not yet supported, but I'm definitely planning to support that&lt;br /&gt;
in the future (it's not a lot of work). Contact me if you need this,&lt;br /&gt;
it might encourage me :-).&lt;br /&gt;
&lt;br /&gt;
==== I'd like to have some more low level access to an mmCIF file... ====&lt;br /&gt;
&lt;br /&gt;
You got it. You can create a python dictionary that maps all mmCIF&lt;br /&gt;
tags in an mmCIF file to their values. If there are multiple values&lt;br /&gt;
(like in the case of tag &amp;lt;code&amp;gt;_atom_site.Cartn_y&amp;lt;/code&amp;gt;, which holds&lt;br /&gt;
the ''y'' coordinates of all atoms), the tag is mapped to a list of values.&lt;br /&gt;
The dictionary is created from the mmCIF file as follows:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
mmcif_dict = MMCIF2Dict('1FAT.cif')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example: get the solvent content from an mmCIF file:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
sc = mmcif_dict['_exptl_crystal.density_percent_sol']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example: get the list of the ''y'' coordinates of all atoms&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
y_list = mmcif_dict['_atom_site.Cartn_y']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Can I access the header information? ====&lt;br /&gt;
&lt;br /&gt;
Thanks to Christian Rother you can access some information from the&lt;br /&gt;
PDB header. Note however that many PDB files contain headers with&lt;br /&gt;
incomplete or erroneous information. Many of the errors have been&lt;br /&gt;
fixed in the equivalent mmCIF files.&lt;br /&gt;
'''Hence, if you are interested in the header information, it is a good idea to extract information from mmCIF files using the &amp;lt;code&amp;gt;MMCIF2Dict&amp;lt;/code&amp;gt; tool described above, instead of parsing the PDB header.''' &lt;br /&gt;
&lt;br /&gt;
Now that is clarified, let's return to parsing the PDB header. The&lt;br /&gt;
structure object has an attribute called &amp;lt;code&amp;gt;header&amp;lt;/code&amp;gt; which is&lt;br /&gt;
a python dictionary that maps header records to their values.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
resolution = structure.header['resolution']&lt;br /&gt;
keywords = structure.header['keywords']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The available keys are &amp;lt;code&amp;gt;name&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;head&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;deposition_date&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;release_date&amp;lt;/code&amp;gt;,&lt;br /&gt;
&amp;lt;code&amp;gt;structure_method&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;resolution&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;structure_reference&amp;lt;/code&amp;gt; (maps to&lt;br /&gt;
a list of references), &amp;lt;code&amp;gt;journal_reference&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;author&amp;lt;/code&amp;gt; and&lt;br /&gt;
&amp;lt;code&amp;gt;compound&amp;lt;/code&amp;gt; (maps to a dictionary with various information about&lt;br /&gt;
the crystallized compound).&lt;br /&gt;
&lt;br /&gt;
The dictionary can also be created without creating a &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt;&lt;br /&gt;
object, ie. directly from the PDB file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
handle = open(filename,'r')&lt;br /&gt;
header_dict = parse_pdb_header(handle)&lt;br /&gt;
handle.close()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Can I use Bio.PDB with NMR structures (ie. with more than one model)? ====&lt;br /&gt;
&lt;br /&gt;
Sure. Many PDB parsers assume that there is only one model, making&lt;br /&gt;
them all but useless for NMR structures. The design of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt;&lt;br /&gt;
object makes it easy to handle PDB files with more than one model&lt;br /&gt;
(see section [[#The Structure object]]). &lt;br /&gt;
&lt;br /&gt;
==== How do I download structures from the PDB? ====&lt;br /&gt;
&lt;br /&gt;
This can be done using the &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object, using the &amp;lt;code&amp;gt;retrieve_pdb_file&amp;lt;/code&amp;gt;&lt;br /&gt;
method. The argument for this method is the PDB identifier of the&lt;br /&gt;
structure.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
pdbl = PDBList()&lt;br /&gt;
pdbl.retrieve_pdb_file('1FAT')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; class can also be used as a command-line tool: &lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
python PDBList.py 1fat&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
The downloaded file will be called &amp;lt;code&amp;gt;pdb1fat.ent&amp;lt;/code&amp;gt; and stored&lt;br /&gt;
in the current working directory. Note that the &amp;lt;code&amp;gt;retrieve_pdb_file&amp;lt;/code&amp;gt;&lt;br /&gt;
method also has an optional argument &amp;lt;code&amp;gt;pdir&amp;lt;/code&amp;gt; that specifies&lt;br /&gt;
a specific directory in which to store the downloaded PDB files. &lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;retrieve_pdb_file&amp;lt;/code&amp;gt; method also has some options for specifying&lt;br /&gt;
the compression format used for the download, and the program used&lt;br /&gt;
for local decompression (default &amp;lt;code&amp;gt;.Z&amp;lt;/code&amp;gt; format and &amp;lt;code&amp;gt;gunzip&amp;lt;/code&amp;gt;).&lt;br /&gt;
In addition, the PDB ftp site can be specified upon creation of the&lt;br /&gt;
&amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object. By default, the server of the Worldwide Protein Data Bank (\url{ftp://ftp.wwpdb.org/pub/pdb/data/structures/divided/pdb/})&lt;br /&gt;
is used. See the API documentation for more details. Thanks again&lt;br /&gt;
to Kristian Rother for donating this module.&lt;br /&gt;
&lt;br /&gt;
==== How do I download the entire PDB? ====&lt;br /&gt;
&lt;br /&gt;
The following commands will store all PDB files in the &amp;lt;code&amp;gt;/data/pdb&amp;lt;/code&amp;gt;&lt;br /&gt;
directory: &lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
python PDBList.py all /data/pdb&lt;br /&gt;
python PDBList.py all /data/pdb -d&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
The API method for this is called &amp;lt;code&amp;gt;download_entire_pdb&amp;lt;/code&amp;gt;.&lt;br /&gt;
Adding the &amp;lt;code&amp;gt;-d&amp;lt;/code&amp;gt; option will store all files in the same directory.&lt;br /&gt;
Otherwise, they are sorted into PDB-style subdirectories according&lt;br /&gt;
to their PDB ID's. Depending on the traffic, a complete download will&lt;br /&gt;
take 2-4 days.&lt;br /&gt;
&lt;br /&gt;
==== How do I keep a local copy of the PDB up-to-date? ====&lt;br /&gt;
&lt;br /&gt;
This can also be done using the &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object. One simply&lt;br /&gt;
creates a &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object (specifying the directory where&lt;br /&gt;
the local copy of the PDB is present) and calls the &amp;lt;code&amp;gt;update_pdb&amp;lt;/code&amp;gt;&lt;br /&gt;
method:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
pl = PDBList(pdb='/data/pdb')&lt;br /&gt;
pl.update_pdb()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
One can of course make a weekly cronjob out of this to keep&lt;br /&gt;
the local copy automatically up-to-date. The PDB ftp site can also&lt;br /&gt;
be specified (see API documentation).&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; has some additional methods that can be of use. The&lt;br /&gt;
&amp;lt;code&amp;gt;get_all_obsolete&amp;lt;/code&amp;gt; method can be used to get a list of all&lt;br /&gt;
obsolete PDB entries. The &amp;lt;code&amp;gt;changed_this_week&amp;lt;/code&amp;gt; method can&lt;br /&gt;
be used to obtain the entries that were added, modified or obsoleted&lt;br /&gt;
during the current week. For more info on the possibilities of &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt;,&lt;br /&gt;
see the API documentation.&lt;br /&gt;
&lt;br /&gt;
==== What about all those buggy PDB files? ====&lt;br /&gt;
&lt;br /&gt;
It is well known that many PDB files contain semantic errors (I'm&lt;br /&gt;
not talking about the structures themselves know, but their representation&lt;br /&gt;
in PDB files). Bio.PDB tries to handle this in two ways. The PDBParser&lt;br /&gt;
object can behave in two ways: a restrictive way and a permissive&lt;br /&gt;
way (THIS IS NOW THE DEFAULT). The restrictive way used to be the&lt;br /&gt;
default, but people seemed to think that Bio.PDB 'crashed' due to&lt;br /&gt;
a bug (hah!), so I changed it. If you ever encounter a real bug, please&lt;br /&gt;
tell me immediately!&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Permissive parser&lt;br /&gt;
parser = PDBParser(PERMISSIVE=1)&lt;br /&gt;
parser = PDBParser()  # The same (default)&lt;br /&gt;
# Strict parser&lt;br /&gt;
strict_parser = PDBParser(PERMISSIVE=0)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
In the permissive state (DEFAULT), PDB files that obviously contain&lt;br /&gt;
errors are 'corrected' (i.e. some residues or atoms are left out).&lt;br /&gt;
These errors include:&lt;br /&gt;
* Multiple residues with the same identifier&lt;br /&gt;
* Multiple atoms with the same identifier (taking into account the altloc identifier)&lt;br /&gt;
These errors indicate real problems in the PDB file (for details see&lt;br /&gt;
the Bioinformatics article). In the restrictive state, PDB files with&lt;br /&gt;
errors cause an exception to occur. This is useful to find errors&lt;br /&gt;
in PDB files.&lt;br /&gt;
&lt;br /&gt;
Some errors however are automatically corrected. Normally each disordered&lt;br /&gt;
atom should have a non-blank altloc identifier. However, there are&lt;br /&gt;
many structures that do not follow this convention, and have a blank&lt;br /&gt;
and a non-blank identifier for two disordered positions of the same&lt;br /&gt;
atom. This is automatically interpreted in the right way.&lt;br /&gt;
&lt;br /&gt;
Sometimes a structure contains a list of residues belonging to chain&lt;br /&gt;
A, followed by residues belonging to chain B, and again followed by&lt;br /&gt;
residues belonging to chain A, i.e. the chains are 'broken'. This&lt;br /&gt;
is also correctly interpreted.&lt;br /&gt;
&lt;br /&gt;
==== Can I write PDB files? ====&lt;br /&gt;
&lt;br /&gt;
Use the PDBIO class for this. It's easy to write out specific parts&lt;br /&gt;
of a structure too, of course.&lt;br /&gt;
&lt;br /&gt;
Example: saving a structure&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
io = PDBIO()&lt;br /&gt;
io.set_structure(s)&lt;br /&gt;
io.save('out.pdb')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
If you want to write out a part of the structure, make use of the&lt;br /&gt;
&amp;lt;code&amp;gt;Select&amp;lt;/code&amp;gt; class (also in &amp;lt;code&amp;gt;PDBIO&amp;lt;/code&amp;gt;). Select has four methods:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
accept_model(model)&lt;br /&gt;
accept_chain(chain)&lt;br /&gt;
accept_residue(residue)&lt;br /&gt;
accept_atom(atom)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
By default, every method returns 1 (which means the model/chain/residue/atom&lt;br /&gt;
is included in the output). By subclassing &amp;lt;code&amp;gt;Select&amp;lt;/code&amp;gt; and returning&lt;br /&gt;
0 when appropriate you can exclude models, chains, etc. from the output.&lt;br /&gt;
Cumbersome maybe, but very powerful. The following code only writes&lt;br /&gt;
out glycine residues:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
class GlySelect(Select):&lt;br /&gt;
    def accept_residue(self, residue):&lt;br /&gt;
        if residue.get_name()=='GLY':&lt;br /&gt;
            return 1&lt;br /&gt;
        else:&lt;br /&gt;
            return 0&lt;br /&gt;
&lt;br /&gt;
io = PDBIO()&lt;br /&gt;
io.set_structure(s)&lt;br /&gt;
io.save('gly_only.pdb', GlySelect())&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
If this is all too complicated for you, the &amp;lt;code&amp;gt;Dice&amp;lt;/code&amp;gt; module contains&lt;br /&gt;
a handy &amp;lt;code&amp;gt;extract&amp;lt;/code&amp;gt; function that writes out all residues in&lt;br /&gt;
a chain between a start and end residue.&lt;br /&gt;
&lt;br /&gt;
==== Can I write mmCIF files? ====&lt;br /&gt;
&lt;br /&gt;
No, and I also don't have plans to add that functionality soon (or&lt;br /&gt;
ever - I don't need it at all, and it's a lot of work, plus no-one&lt;br /&gt;
has ever asked for it). People who want to add this can contact me.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== The Structure object ===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==== What's the overall layout of a Structure object? ====&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object follows the so-called '''SMCRA'''&lt;br /&gt;
(Structure/Model/Chain/Residue/Atom) architecture : &lt;br /&gt;
* A structure consists of models&lt;br /&gt;
* A model consists of chains&lt;br /&gt;
* A chain consists of residues&lt;br /&gt;
* A residue consists of atoms&lt;br /&gt;
This is the way many structural biologists/bioinformaticians think&lt;br /&gt;
about structure, and provides a simple but efficient way to deal with&lt;br /&gt;
structure. Additional stuff is essentially added when needed. A UML&lt;br /&gt;
diagram of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object (forgetting about the &amp;lt;code&amp;gt;Disordered&amp;lt;/code&amp;gt;&lt;br /&gt;
classes for now) is shown in the figure below.&lt;br /&gt;
&lt;br /&gt;
[[Image:Smcra.png|600px|left|frame|Diagram of SMCRA architecture of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object. Full lines with diamonds denote aggregation, full lines with arrows denote referencing, full lines with triangles denote inheritance and dashed lines with triangles denote interface realization.]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br clear=all&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I navigate through a Structure object? ====&lt;br /&gt;
&lt;br /&gt;
The following code iterates through all atoms of a structure:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
p = PDBParser()&lt;br /&gt;
structure = p.get_structure('X', 'pdb1fat.ent')&lt;br /&gt;
for model in structure:&lt;br /&gt;
    for chain in model:&lt;br /&gt;
        for residue in chain:&lt;br /&gt;
            for atom in residue:&lt;br /&gt;
                print atom&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
There are also some shortcuts:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Iterate over all atoms in a structure&lt;br /&gt;
for atom in structure.get_atoms():&lt;br /&gt;
    print atom&lt;br /&gt;
&lt;br /&gt;
# Iterate over all residues in a model&lt;br /&gt;
for residue in model.get_residues():&lt;br /&gt;
    print residue&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Structures, models, chains, residues and atoms are called '''Entities'''&lt;br /&gt;
in Biopython. You can always get a parent &amp;lt;code&amp;gt;Entity&amp;lt;/code&amp;gt; from a child&lt;br /&gt;
&amp;lt;code&amp;gt;Entity&amp;lt;/code&amp;gt;, e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
residue = atom.get_parent()&lt;br /&gt;
chain = residue.get_parent()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
You can also test whether an &amp;lt;code&amp;gt;Entity&amp;lt;/code&amp;gt; has a certain child using&lt;br /&gt;
the &amp;lt;code&amp;gt;has_id&amp;lt;/code&amp;gt; method.&lt;br /&gt;
&lt;br /&gt;
==== Can I do that a bit more conveniently? ====&lt;br /&gt;
&lt;br /&gt;
You can do things like:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
atoms = structure.get_atoms()&lt;br /&gt;
residues = structure.get_residues()&lt;br /&gt;
atoms = chain.get_atoms()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
You can also use the &amp;lt;code&amp;gt;Selection.unfold_entities&amp;lt;/code&amp;gt; function:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Get all residues from a structure&lt;br /&gt;
res_list = Selection.unfold_entities(structure, 'R')&lt;br /&gt;
# Get all atoms from a chain&lt;br /&gt;
atom_list = Selection.unfold_entities(chain, 'A')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Obviously, &amp;lt;code&amp;gt;A&amp;lt;/code&amp;gt;=atom, &amp;lt;code&amp;gt;R&amp;lt;/code&amp;gt;=residue, &amp;lt;code&amp;gt;C&amp;lt;/code&amp;gt;=chain, &amp;lt;code&amp;gt;M&amp;lt;/code&amp;gt;=model, &amp;lt;code&amp;gt;S&amp;lt;/code&amp;gt;=structure.&lt;br /&gt;
You can use this to go up in the hierarchy, e.g. to get a list of (unique) &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;Chain&amp;lt;/code&amp;gt; parents from a list of&lt;br /&gt;
&amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;s:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
residue_list = Selection.unfold_entities(atom_list, 'R')&lt;br /&gt;
chain_list = Selection.unfold_entities(atom_list, 'C')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
For more info, see the API documentation.&lt;br /&gt;
&lt;br /&gt;
==== How do I extract a specific &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;Chain&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;Model&amp;lt;/code&amp;gt; from a &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt;? ====&lt;br /&gt;
&lt;br /&gt;
Easy. Here are some examples:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
model = structure[0]&lt;br /&gt;
chain = model['A']&lt;br /&gt;
residue = chain[100]&lt;br /&gt;
atom = residue['CA']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Note that you can use a shortcut:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
atom = structure[0]['A'][100]['CA']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== What is a model id? ====&lt;br /&gt;
&lt;br /&gt;
The model id is an integer which denotes the rank of the model in&lt;br /&gt;
the PDB/mmCIF file. The model is starts at 0. Crystal structures generally&lt;br /&gt;
have only one model (with id 0), while NMR files usually have several&lt;br /&gt;
models.&lt;br /&gt;
&lt;br /&gt;
==== What is a chain id? ====&lt;br /&gt;
&lt;br /&gt;
The chain id is specified in the PDB/mmCIF file, and is a single character&lt;br /&gt;
(typically a letter). &lt;br /&gt;
&lt;br /&gt;
==== What is a residue id? ====&lt;br /&gt;
&lt;br /&gt;
This is a bit more complicated, due to the clumsy PDB format. A residue&lt;br /&gt;
id is a tuple with three elements:&lt;br /&gt;
* The '''hetero-flag''': this is &amp;lt;code&amp;gt;'H_'&amp;lt;/code&amp;gt; plus the name of the hetero-residue (e.g. &amp;lt;code&amp;gt;'H_GLC'&amp;lt;/code&amp;gt; in the case of a glucose molecule), or &amp;lt;code&amp;gt;'W'&amp;lt;/code&amp;gt; in the case of a water molecule.&lt;br /&gt;
* The '''sequence identifier''' in the chain, e.g. 100&lt;br /&gt;
* The '''insertion code''', e.g. &amp;lt;code&amp;gt;'A'&amp;lt;/code&amp;gt;. The insertion code is sometimes used to preserve a certain desirable residue numbering scheme. A Ser 80 insertion mutant (inserted e.g. between a Thr 80 and an Asn 81 residue) could e.g. have sequence identifiers and insertion codes as follows: Thr 80 A, Ser 80 B, Asn 81. In this way the residue numbering scheme stays in tune with that of the wild type structure.&lt;br /&gt;
&lt;br /&gt;
The id of the above glucose residue would thus be &amp;lt;code&amp;gt;('H_GLC', 100, 'A')&amp;lt;/code&amp;gt;. If the hetero-flag and insertion code are blank, the sequence&lt;br /&gt;
identifier alone can be used:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Full id&lt;br /&gt;
residue = chain[(' ', 100, ' ')]&lt;br /&gt;
# Shortcut id&lt;br /&gt;
residue = chain[100]&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
The reason for the hetero-flag is that many, many PDB files use the same sequence identifier for an amino acid and a hetero-residue or a water, which would create obvious problems if the hetero-flag was not used.&lt;br /&gt;
&lt;br /&gt;
==== What is an atom id? ====&lt;br /&gt;
&lt;br /&gt;
The atom id is simply the atom name (eg. &amp;lt;code&amp;gt;'CA'&amp;lt;/code&amp;gt;). In practice,&lt;br /&gt;
the atom name is created by stripping all spaces from the atom name&lt;br /&gt;
in the PDB file. &lt;br /&gt;
&lt;br /&gt;
However, in PDB files, a space can be part of an atom name. Often,&lt;br /&gt;
calcium atoms are called &amp;lt;code&amp;gt;'CA..'&amp;lt;/code&amp;gt; in order to distinguish them&lt;br /&gt;
from C&amp;amp;alpha; atoms (which are called &amp;lt;code&amp;gt;'.CA.'&amp;lt;/code&amp;gt;). In cases&lt;br /&gt;
were stripping the spaces would create problems (ie. two atoms called&lt;br /&gt;
&amp;lt;code&amp;gt;'CA'&amp;lt;/code&amp;gt; in the same residue) the spaces are kept.&lt;br /&gt;
&lt;br /&gt;
==== How is disorder handled? ====&lt;br /&gt;
&lt;br /&gt;
This is one of the strong points of Bio.PDB. It can handle both disordered&lt;br /&gt;
atoms and point mutations (ie. a Gly and an Ala residue in the same&lt;br /&gt;
position). &lt;br /&gt;
&lt;br /&gt;
Disorder should be dealt with from two points of view: the atom and&lt;br /&gt;
the residue points of view. In general, I have tried to encapsulate&lt;br /&gt;
all the complexity that arises from disorder. If you just want to&lt;br /&gt;
loop over all C&amp;amp;alpha; atoms, you do not care that some residues&lt;br /&gt;
have a disordered side chain. On the other hand it should also be&lt;br /&gt;
possible to represent disorder completely in the data structure. Therefore,&lt;br /&gt;
disordered atoms or residues are stored in special objects that behave&lt;br /&gt;
as if there is no disorder. This is done by only representing a subset&lt;br /&gt;
of the disordered atoms or residues. Which subset is picked (e.g.&lt;br /&gt;
which of the two disordered OG side chain atom positions of a Ser&lt;br /&gt;
residue is used) can be specified by the user.&lt;br /&gt;
&lt;br /&gt;
'''Disordered atom positions''' are represented by ordinary &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;&lt;br /&gt;
objects, but all &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects that represent the same physical&lt;br /&gt;
atom are stored in a &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; object (see section [[#The Structure object]]).&lt;br /&gt;
Each &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object in a &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; object can&lt;br /&gt;
be uniquely indexed using its altloc specifier. The &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt;&lt;br /&gt;
object forwards all uncaught method calls to the selected Atom object,&lt;br /&gt;
by default the one that represents the atom with the highest&lt;br /&gt;
occupancy. The user can of course change the selected &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;&lt;br /&gt;
object, making use of its altloc specifier. In this way atom disorder&lt;br /&gt;
is represented correctly without much additional complexity. In other&lt;br /&gt;
words, if you are not interested in atom disorder, you will not be&lt;br /&gt;
bothered by it.&lt;br /&gt;
&lt;br /&gt;
Each disordered atom has a characteristic altloc identifier. You can&lt;br /&gt;
specify that a &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; object should behave like&lt;br /&gt;
the &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object associated with a specific altloc identifier:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
atom.disordered_select('A')  # select altloc A atom&lt;br /&gt;
atom.disordered_select('B')  # select altloc B atom &lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
A special case arises when disorder is due to '''point mutations''',&lt;br /&gt;
i.e. when two or more point mutants of a polypeptide are present in&lt;br /&gt;
the crystal. An example of this can be found in PDB structure 1EN2.&lt;br /&gt;
&lt;br /&gt;
Since these residues belong to a different residue type (e.g. let's&lt;br /&gt;
say Ser 60 and Cys 60) they should not be stored in a single &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt;&lt;br /&gt;
object as in the common case. In this case, each residue is represented&lt;br /&gt;
by one &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object, and both &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; objects&lt;br /&gt;
are stored in a single &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt; object (see section [[#The Structure object]]).&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt; object forwards all uncaught&lt;br /&gt;
methods to the selected &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object (by default the last&lt;br /&gt;
&amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object added), and thus behaves like an ordinary&lt;br /&gt;
residue. Each &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object in a &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt;&lt;br /&gt;
object can be uniquely identified by its residue name. In the above&lt;br /&gt;
example, residue Ser 60 would have id 'SER' in the &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt;&lt;br /&gt;
object, while residue Cys 60 would have id 'CYS'. The user can select&lt;br /&gt;
the active &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object in a &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt;&lt;br /&gt;
object via this id.&lt;br /&gt;
&lt;br /&gt;
Example: suppose that a chain has a point mutation at position 10,&lt;br /&gt;
consisting of a Ser and a Cys residue. Make sure that residue 10 of&lt;br /&gt;
this chain behaves as the Cys residue.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
residue = chain[10]&lt;br /&gt;
residue.disordered_select('CYS')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
In addition, you can get a list of all &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects (ie.&lt;br /&gt;
all &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; objects are 'unpacked' to their individual&lt;br /&gt;
&amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects) using the &amp;lt;code&amp;gt;get_unpacked_list&amp;lt;/code&amp;gt; method&lt;br /&gt;
of a (&amp;lt;code&amp;gt;Disordered&amp;lt;/code&amp;gt;)&amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object.&lt;br /&gt;
&lt;br /&gt;
==== Can I sort residues in a chain somehow? ====&lt;br /&gt;
&lt;br /&gt;
Yes, kinda, but I'm waiting for a request for this feature to finish&lt;br /&gt;
it :-).&lt;br /&gt;
&lt;br /&gt;
==== How are ligands and solvent handled? ====&lt;br /&gt;
&lt;br /&gt;
See 'What is a residue id?'.&lt;br /&gt;
&lt;br /&gt;
==== What about B factors? ====&lt;br /&gt;
&lt;br /&gt;
Well, yes! Bio.PDB supports isotropic and anisotropic B factors, and&lt;br /&gt;
also deals with standard deviations of anisotropic B factor if present&lt;br /&gt;
(see the section [[#Analysis]]).&lt;br /&gt;
&lt;br /&gt;
==== What about standard deviation of atomic positions? ====&lt;br /&gt;
&lt;br /&gt;
Yup, supported. See the section [[#Analysis]].&lt;br /&gt;
&lt;br /&gt;
==== I think the SMCRA data structure is not flexible/sexy/whatever enough... ====&lt;br /&gt;
&lt;br /&gt;
Sure, sure. Everybody is always coming up with (mostly vaporware or&lt;br /&gt;
partly implemented) data structures that handle all possible situations&lt;br /&gt;
and are extensible in all thinkable (and unthinkable) ways. The prosaic&lt;br /&gt;
truth however is that 99.9\% of people using (and I mean really using!)&lt;br /&gt;
crystal structures think in terms of models, chains, residues and&lt;br /&gt;
atoms. The philosophy of Bio.PDB is to provide a reasonably fast,&lt;br /&gt;
clean, simple, but complete data structure to access structure data.&lt;br /&gt;
The proof of the pudding is in the eating.&lt;br /&gt;
&lt;br /&gt;
Moreover, it is quite easy to build more specialised data structures&lt;br /&gt;
on top of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; class (eg. there's a &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt;&lt;br /&gt;
class). On the other hand, the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object is built&lt;br /&gt;
using a Parser/Consumer approach (called &amp;lt;code&amp;gt;PDBParser&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;MMCIFParser&amp;lt;/code&amp;gt;&lt;br /&gt;
and &amp;lt;code&amp;gt;StructureBuilder&amp;lt;/code&amp;gt;, respectively). One can easily reuse&lt;br /&gt;
the PDB/mmCIF parsers by implementing a specialised &amp;lt;code&amp;gt;StructureBuilder&amp;lt;/code&amp;gt;&lt;br /&gt;
class. It is of course also trivial to add support for new file formats&lt;br /&gt;
by writing new parsers.&lt;br /&gt;
&lt;br /&gt;
=== Analysis ===&lt;br /&gt;
&lt;br /&gt;
==== How do I extract information from an &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object? ====&lt;br /&gt;
&lt;br /&gt;
Using the following methods:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
a.get_name()           # atom name (spaces stripped, e.g. 'CA')&lt;br /&gt;
a.get_id()             # id (equals atom name)&lt;br /&gt;
a.get_coord()          # atomic coordinates&lt;br /&gt;
a.get_vector()         # atomic coordinates as Vector object&lt;br /&gt;
a.get_bfactor()        # isotropic B factor&lt;br /&gt;
a.get_occupancy()      # occupancy&lt;br /&gt;
a.get_altloc()         # alternative location specifier&lt;br /&gt;
a.get_sigatm()         # std. dev. of atomic parameters&lt;br /&gt;
a.get_siguij()         # std. dev. of anisotropic B factor&lt;br /&gt;
a.get_anisou()         # anisotropic B factor&lt;br /&gt;
a.get_fullname()       # atom name (with spaces, e.g. '.CA.')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I extract information from a &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object? ====&lt;br /&gt;
&lt;br /&gt;
Using the following methods:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
r.get_resname()         # return the residue name (eg. 'GLY')&lt;br /&gt;
r.is_disordered()       # 1 if the residue has disordered atoms&lt;br /&gt;
r.get_segid()           # return the SEGID&lt;br /&gt;
r.has_id(name)          # test if a residue has a certain atom&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I measure distances? ====&lt;br /&gt;
&lt;br /&gt;
That's simple: the minus operator for atoms has been overloaded to&lt;br /&gt;
return the distance between two atoms. &lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Get some atoms&lt;br /&gt;
ca1 = residue1['CA']&lt;br /&gt;
ca2 = residue2['CA']&lt;br /&gt;
# Simply subtract the atoms to get their distance&lt;br /&gt;
distance = ca1-ca2&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I measure angles? ====&lt;br /&gt;
&lt;br /&gt;
This can easily be done via the vector representation of the atomic coordinates, and the &amp;lt;code&amp;gt;calc_angle&amp;lt;/code&amp;gt; function from the &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; module:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
vector1 = atom1.get_vector()&lt;br /&gt;
vector2 = atom2.get_vector()&lt;br /&gt;
vector3 = atom3.get_vector()&lt;br /&gt;
angle = calc_angle(vector1, vector2, vector3)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I measure torsion angles? ====&lt;br /&gt;
&lt;br /&gt;
Again, this can easily be done via the vector representation of the&lt;br /&gt;
atomic coordinates, this time using the &amp;lt;code&amp;gt;calc_dihedral&amp;lt;/code&amp;gt; function&lt;br /&gt;
from the &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; module:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
vector1 = atom1.get_vector()&lt;br /&gt;
vector2 = atom2.get_vector()&lt;br /&gt;
vector3 = atom3.get_vector()&lt;br /&gt;
vector4 = atom4.get_vector()&lt;br /&gt;
angle = calc_dihedral(vector1, vector2, vector3, vector4)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I determine atom-atom contacts? ====&lt;br /&gt;
&lt;br /&gt;
Use &amp;lt;code&amp;gt;NeighborSearch&amp;lt;/code&amp;gt;. This uses a KD tree data structure coded&lt;br /&gt;
in C behind the screens, so it's pretty darn fast (see &amp;lt;code&amp;gt;Bio.KDTree&amp;lt;/code&amp;gt;).&lt;br /&gt;
&lt;br /&gt;
==== How do I extract polypeptides from a &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object? ====&lt;br /&gt;
&lt;br /&gt;
Use &amp;lt;code&amp;gt;PolypeptideBuilder&amp;lt;/code&amp;gt;. You can use the resulting &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt;&lt;br /&gt;
object to get the sequence as a &amp;lt;code&amp;gt;Seq&amp;lt;/code&amp;gt; object or to get a list&lt;br /&gt;
of C&amp;amp;alpha; atoms as well. Polypeptides can be built using a C-N&lt;br /&gt;
or a C&amp;amp;alpha;-C&amp;amp;alpha; distance criterion.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Using C-N &lt;br /&gt;
ppb=PPBuilder()&lt;br /&gt;
for pp in ppb.build_peptides(structure): &lt;br /&gt;
    print pp.get_sequence()&lt;br /&gt;
&lt;br /&gt;
# Using CA-CA&lt;br /&gt;
ppb=CaPPBuilder()&lt;br /&gt;
for pp in ppb.build_peptides(structure): &lt;br /&gt;
    print pp.get_sequence()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that in the above case only model 0 of the structure is considered&lt;br /&gt;
by &amp;lt;code&amp;gt;PolypeptideBuilder&amp;lt;/code&amp;gt;. However, it is possible to use &amp;lt;code&amp;gt;PolypeptideBuilder&amp;lt;/code&amp;gt;&lt;br /&gt;
to build &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt; objects from &amp;lt;code&amp;gt;Model&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;Chain&amp;lt;/code&amp;gt;&lt;br /&gt;
objects as well.&lt;br /&gt;
&lt;br /&gt;
==== How do I get the sequence of a structure? ====&lt;br /&gt;
&lt;br /&gt;
The first thing to do is to extract all polypeptides from the structure&lt;br /&gt;
(see previous entry). The sequence of each polypeptide can then easily&lt;br /&gt;
be obtained from the &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt; objects. The sequence is&lt;br /&gt;
represented as a Biopython &amp;lt;code&amp;gt;Seq&amp;lt;/code&amp;gt; object, and its alphabet is&lt;br /&gt;
defined by a &amp;lt;code&amp;gt;ProteinAlphabet&amp;lt;/code&amp;gt; object.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; seq = polypeptide.get_sequence()&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print seq&lt;br /&gt;
Seq('SNVVE...', &amp;lt;class Bio.Alphabet.ProteinAlphabet&amp;gt;)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I determine secondary structure? ====&lt;br /&gt;
&lt;br /&gt;
For this functionality, you need to install DSSP (and obtain a license&lt;br /&gt;
for it - free for academic use, see http://www.cmbi.kun.nl/gv/dssp/).&lt;br /&gt;
Then use the &amp;lt;code&amp;gt;DSSP&amp;lt;/code&amp;gt; class, which maps &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; objects&lt;br /&gt;
to their secondary structure (and accessible surface area). The DSSP&lt;br /&gt;
codes are listed in Table \ref{cap:DSSP-codes. Note that DSSP (the&lt;br /&gt;
program, and thus by consequence the class) cannot handle multiple&lt;br /&gt;
models!&lt;br /&gt;
&lt;br /&gt;
%&lt;br /&gt;
\begin{table&lt;br /&gt;
&lt;br /&gt;
==== \begin{tabular{|c|c|&lt;br /&gt;
\hline &lt;br /&gt;
Code&amp;amp;&lt;br /&gt;
Secondary structure\tabularnewline&lt;br /&gt;
\hline&lt;br /&gt;
\hline &lt;br /&gt;
H&amp;amp;&lt;br /&gt;
&amp;amp;alpha;-helix\tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
B&amp;amp;&lt;br /&gt;
Isolated &amp;amp;beta;-bridge residue\tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
E&amp;amp;&lt;br /&gt;
Strand \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
G&amp;amp;&lt;br /&gt;
3-10 helix \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
I&amp;amp;&lt;br /&gt;
$\Pi$-helix \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
T&amp;amp;&lt;br /&gt;
Turn\tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
S&amp;amp;&lt;br /&gt;
Bend \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
-&amp;amp;&lt;br /&gt;
Other\tabularnewline&lt;br /&gt;
\hline&lt;br /&gt;
\end{tabular&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
\caption{\label{cap:DSSP-codesDSSP codes in Bio.PDB.&lt;br /&gt;
\end{table&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==== How do I calculate the accessible surface area of a residue? ====&lt;br /&gt;
&lt;br /&gt;
Use the &amp;lt;code&amp;gt;DSSP&amp;lt;/code&amp;gt; class (see also previous entry). But see also&lt;br /&gt;
next entry.&lt;br /&gt;
&lt;br /&gt;
==== How do I calculate residue depth? ====&lt;br /&gt;
&lt;br /&gt;
Residue depth is the average distance of a residue's atoms from the&lt;br /&gt;
solvent accessible surface. It's a fairly new and very powerful parameterization&lt;br /&gt;
of solvent accessibility. For this functionality, you need to install&lt;br /&gt;
Michel Sanner's MSMS program (http://www.scripps.edu/pub/olson-web/people/sanner/html/msms_home.html).&lt;br /&gt;
Then use the &amp;lt;code&amp;gt;ResidueDepth&amp;lt;/code&amp;gt; class. This class behaves as a&lt;br /&gt;
dictionary which maps &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; objects to corresponding (residue&lt;br /&gt;
depth, C&amp;amp;alpha; depth) tuples. The C&amp;amp;alpha; depth is the distance&lt;br /&gt;
of a residue's C&amp;amp;alpha; atom to the solvent accessible surface. &lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
model = structure[0]&lt;br /&gt;
rd = ResidueDepth(model, pdb_file)&lt;br /&gt;
residue_depth, ca_depth = rd[some_residue]&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
You can also get access to the molecular surface itself (via the &amp;lt;code&amp;gt;get_surface&amp;lt;/code&amp;gt;&lt;br /&gt;
function), in the form of a Numeric python array with the surface points.&lt;br /&gt;
&lt;br /&gt;
==== How do I calculate Half Sphere Exposure? ====&lt;br /&gt;
&lt;br /&gt;
Half Sphere Exposure (HSE) is a new, 2D measure of solvent exposure.&lt;br /&gt;
Basically, it counts the number of C&amp;amp;alpha; atoms around a residue&lt;br /&gt;
in the direction of its side chain, and in the opposite direction&lt;br /&gt;
(within a radius of 13 Å). Despite its simplicity, it outperforms&lt;br /&gt;
many other measures of solvent exposure. An article describing this&lt;br /&gt;
novel 2D measure has been submitted.&lt;br /&gt;
&lt;br /&gt;
HSE comes in two flavors: HSE&amp;amp;alpha; and HSE&amp;amp;beta;. The former&lt;br /&gt;
only uses the C&amp;amp;alpha; atom positions, while the latter uses the&lt;br /&gt;
C&amp;amp;alpha; and C&amp;amp;beta; atom positions. The HSE measure is calculated&lt;br /&gt;
by the &amp;lt;code&amp;gt;HSExposure&amp;lt;/code&amp;gt; class, which can also calculate the contact&lt;br /&gt;
number. The latter class has methods which return dictionaries that&lt;br /&gt;
map a &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object to its corresponding HSE&amp;amp;alpha;, HSE&amp;amp;beta;&lt;br /&gt;
and contact number values.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
model = structure[0]&lt;br /&gt;
hse = HSExposure()&lt;br /&gt;
# Calculate HSEalpha&lt;br /&gt;
exp_ca = hse.calc_hs_exposure(model, option='CA3')&lt;br /&gt;
# Calculate HSEbeta&lt;br /&gt;
exp_cb = hse.calc_hs_exposure(model, option='CB')&lt;br /&gt;
# Calculate classical coordination number&lt;br /&gt;
exp_fs = hse.calc_fs_exposure(model)&lt;br /&gt;
# Print HSEalpha for a residue&lt;br /&gt;
print exp_ca[some_residue]&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I map the residues of two related structures onto each other? ====&lt;br /&gt;
&lt;br /&gt;
First, create an alignment file in FASTA format, then use the &amp;lt;code&amp;gt;StructureAlignment&amp;lt;/code&amp;gt;&lt;br /&gt;
class. This class can also be used for alignments with more than two&lt;br /&gt;
structures.&lt;br /&gt;
&lt;br /&gt;
==== How do I test if a Residue object is an amino acid? ====&lt;br /&gt;
&lt;br /&gt;
Use &amp;lt;code&amp;gt;is_aa(residue)&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
==== Can I do vector operations on atomic coordinates? ====&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects return a &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; object representation&lt;br /&gt;
of the coordinates with the &amp;lt;code&amp;gt;get_vector&amp;lt;/code&amp;gt; method. &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt;&lt;br /&gt;
implements the full set of 3D vector operations, matrix multiplication&lt;br /&gt;
(left and right) and some advanced rotation-related operations as&lt;br /&gt;
well. See also next question.&lt;br /&gt;
&lt;br /&gt;
==== How do I put a virtual C&amp;amp;beta; on a Gly residue? ====&lt;br /&gt;
&lt;br /&gt;
OK, I admit, this example is only present to show off the possibilities&lt;br /&gt;
of Bio.PDB's &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; module (though this code is actually&lt;br /&gt;
used in the &amp;lt;code&amp;gt;HSExposure&amp;lt;/code&amp;gt; module, which contains a novel way&lt;br /&gt;
to parametrize residue exposure - publication underway). Suppose that&lt;br /&gt;
you would like to find the position of a Gly residue's C&amp;amp;beta; atom,&lt;br /&gt;
if it had one. How would you do that? Well, rotating the N atom of&lt;br /&gt;
the Gly residue along the C&amp;amp;alpha;-C bond over -120 degrees roughly&lt;br /&gt;
puts it in the position of a virtual C&amp;amp;beta; atom. Here's how to&lt;br /&gt;
do it, making use of the &amp;lt;code&amp;gt;rotaxis&amp;lt;/code&amp;gt; method (which can be used&lt;br /&gt;
to construct a rotation around a certain axis) of the &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt;&lt;br /&gt;
module:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# get atom coordinates as vectors&lt;br /&gt;
n = residue['N'].get_vector() &lt;br /&gt;
c = residue['C'].get_vector() &lt;br /&gt;
ca = residue['CA'].get_vector()&lt;br /&gt;
# center at origin&lt;br /&gt;
n = n - ca&lt;br /&gt;
c = c - ca&lt;br /&gt;
# find rotation matrix that rotates n -120 degrees along the ca-c vector&lt;br /&gt;
rot = rotaxis(-pi{*120.0/180.0, c)&lt;br /&gt;
# apply rotation to ca-n vector&lt;br /&gt;
cb_at_origin = n.left_multiply(rot)&lt;br /&gt;
# put on top of ca atom&lt;br /&gt;
cb = cb_at_origin + ca&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
This example shows that it's possible to do some quite nontrivial&lt;br /&gt;
vector operations on atomic data, which can be quite useful. In addition&lt;br /&gt;
to all the usual vector operations (cross (use &amp;lt;code&amp;gt;**&amp;lt;/code&amp;gt;), and&lt;br /&gt;
dot (use &amp;lt;code&amp;gt;*&amp;lt;/code&amp;gt;) product, angle, norm, etc.) and the above mentioned&lt;br /&gt;
&amp;lt;code&amp;gt;rotaxis function, the &amp;lt;code&amp;gt;Vector module also has methods&lt;br /&gt;
to rotate (&amp;lt;code&amp;gt;rotmat&amp;lt;/code&amp;gt;) or reflect (&amp;lt;code&amp;gt;refmat&amp;lt;/code&amp;gt;) one vector&lt;br /&gt;
on top of another.&lt;br /&gt;
&lt;br /&gt;
=== Manipulating the structure ===&lt;br /&gt;
&lt;br /&gt;
==== How do I superimpose two structures? ====&lt;br /&gt;
&lt;br /&gt;
Surprisingly, this is done using the &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; object.&lt;br /&gt;
This object calculates the rotation and translation matrix that rotates&lt;br /&gt;
two lists of atoms on top of each other in such a way that their RMSD&lt;br /&gt;
is minimized. Of course, the two lists need to contain the same amount&lt;br /&gt;
of atoms. The &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; object can also apply the rotation/translation&lt;br /&gt;
to a list of atoms. The rotation and translation are stored as a tuple&lt;br /&gt;
in the &amp;lt;code&amp;gt;rotran&amp;lt;/code&amp;gt; attribute of the &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; object&lt;br /&gt;
(note that the rotation is right multiplying!). The RMSD is stored&lt;br /&gt;
in the &amp;lt;code&amp;gt;rmsd&amp;lt;/code&amp;gt; attribute.&lt;br /&gt;
&lt;br /&gt;
The algorithm used by &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; comes from \textit{Matrix&lt;br /&gt;
computations, 2nd ed. Golub, G. \&amp;amp; Van Loan (1989) and makes use&lt;br /&gt;
of singular value decomposition (this is implemented in the general&lt;br /&gt;
&amp;lt;code&amp;gt;Bio.SVDSuperimposer&amp;lt;/code&amp;gt; module).&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
sup = Superimposer()&lt;br /&gt;
# Specify the atom lists&lt;br /&gt;
# 'fixed' and 'moving' are lists of Atom objects&lt;br /&gt;
# The moving atoms will be put on the fixed atoms&lt;br /&gt;
sup.set_atoms(fixed, moving)&lt;br /&gt;
# Print rotation/translation/rmsd&lt;br /&gt;
print sup.rotran&lt;br /&gt;
print sup.rms &lt;br /&gt;
# Apply rotation/translation to the moving atoms&lt;br /&gt;
sup.apply(moving)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I superimpose two structures based on their active sites? ====&lt;br /&gt;
&lt;br /&gt;
Pretty easily. Use the active site atoms to calculate the rotation/translation&lt;br /&gt;
matrices (see above), and apply these to the whole molecule.&lt;br /&gt;
&lt;br /&gt;
==== Can I manipulate the atomic coordinates? ====&lt;br /&gt;
&lt;br /&gt;
Yes, using the &amp;lt;code&amp;gt;transform&amp;lt;/code&amp;gt; method of the &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object,&lt;br /&gt;
or directly using the &amp;lt;code&amp;gt;set_coord&amp;lt;/code&amp;gt; method.&lt;br /&gt;
&lt;br /&gt;
== Other Structural Bioinformatics modules ==&lt;br /&gt;
&lt;br /&gt;
==== Bio.SCOP ====&lt;br /&gt;
&lt;br /&gt;
See the main Biopython tutorial.&lt;br /&gt;
&lt;br /&gt;
==== Bio.FSSP ====&lt;br /&gt;
&lt;br /&gt;
No documentation available yet.&lt;br /&gt;
&lt;br /&gt;
== You haven't answered my question yet! ==&lt;br /&gt;
&lt;br /&gt;
Woah! It's late and I'm tired, and a glass of excellent ''Pedro Ximenez'' sherry is waiting for me. Just drop me a mail, and I'll answer&lt;br /&gt;
you in the morning (with a bit of luck...).&lt;br /&gt;
&lt;br /&gt;
== Contributors ==&lt;br /&gt;
&lt;br /&gt;
The main author/maintainer of Bio.PDB is:&lt;br /&gt;
&lt;br /&gt;
 Thomas Hamelryck&lt;br /&gt;
 Bioinformatics center&lt;br /&gt;
 Institute of Molecular Biology&lt;br /&gt;
 University of Copenhagen&lt;br /&gt;
 Universitetsparken 15, Bygning 10&lt;br /&gt;
 DK-2100 København Ø&lt;br /&gt;
 Denmark&lt;br /&gt;
 thamelry@binf.ku.dk&lt;br /&gt;
&lt;br /&gt;
Kristian Rother donated code to interact with the PDB database, and to parse the PDB&lt;br /&gt;
header. Indraneel Majumdar sent in some bug reports and assisted in&lt;br /&gt;
coding the &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt; module. Many thanks to Brad Chapman,&lt;br /&gt;
Jeffrey Chang, Andrew Dalke and Iddo Friedberg for suggestions, comments,&lt;br /&gt;
help and/or biting criticism :-).&lt;br /&gt;
&lt;br /&gt;
== Can I contribute? ==&lt;br /&gt;
&lt;br /&gt;
Yes, yes, yes! Just send an e-mail to [mailto:thamelry@binf.ku.dk Thomas Hamelryck] or to [mailto:biopython-dev@biopython.org the Biopython developers] if you have something useful to contribute! Eternal fame awaits!&lt;/div&gt;</summary>
		<author><name>Mdehoon</name></author>	</entry>

	<entry>
		<id>http://www.biopython.org/wiki/The_Biopython_Structural_Bioinformatics_FAQ</id>
		<title>The Biopython Structural Bioinformatics FAQ</title>
		<link rel="alternate" type="text/html" href="http://www.biopython.org/wiki/The_Biopython_Structural_Bioinformatics_FAQ"/>
				<updated>2013-03-26T08:33:29Z</updated>
		
		<summary type="html">&lt;p&gt;Mdehoon: /* How is disorder handled? */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Introduction ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB is a biopython module that focuses on working with crystal&lt;br /&gt;
structures of biological macromolecules. This document gives a fairly&lt;br /&gt;
complete overview of Bio.PDB.&lt;br /&gt;
&lt;br /&gt;
== Bio.PDB's installation ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB is automatically installed as part of Biopython. Biopython&lt;br /&gt;
can be obtained from http://www.biopython.org. It runs on many&lt;br /&gt;
platforms (Linux/Unix, windows, Mac,...).&lt;br /&gt;
&lt;br /&gt;
== Who's using Bio.PDB? ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB was used in the construction of DISEMBL, a web server that&lt;br /&gt;
predicts disordered regions in proteins (http://dis.embl.de/),&lt;br /&gt;
and COLUMBA, a website that provides annotated protein structures&lt;br /&gt;
(http://www.columba-db.de/). Bio.PDB has also been used to&lt;br /&gt;
perform a large scale search for active sites similarities between&lt;br /&gt;
protein structures in the PDB (see [http://dx.doi.org/10.1002/prot.10338 \textit{Proteins Struct. Func.&lt;br /&gt;
Gen., \textbf{2003, 51, 96-108]) , and to develop a new algorithm&lt;br /&gt;
that identifies linear secondary structure elements ([\emph{BMC Bioinformatics,&lt;br /&gt;
\textbf{2005, 6, 202 http://www.biomedcentral.com/1471-2105/6/202]).&lt;br /&gt;
&lt;br /&gt;
Judging from requests for features and information, Bio.PDB is also&lt;br /&gt;
used by several LPCs (Large Pharmaceutical Companies :-).&lt;br /&gt;
&lt;br /&gt;
== Is there a Bio.PDB reference? ==&lt;br /&gt;
&lt;br /&gt;
Yes, and I'd appreciate it if you would refer to Bio.PDB in publications&lt;br /&gt;
if you make use of it. The reference is:&lt;br /&gt;
&lt;br /&gt;
\begin{quote&lt;br /&gt;
Hamelryck, T., Manderick, B. (2003) PDB parser and structure class&lt;br /&gt;
implemented in Python. \textit{Bioinformatics, \textbf{19, 2308-2310. &lt;br /&gt;
\end{quote&lt;br /&gt;
The article can be freely downloaded via the Bioinformatics journal&lt;br /&gt;
website (http://www.binf.ku.dk/users/thamelry/references.html).&lt;br /&gt;
I welcome e-mails telling me what you are using Bio.PDB for. Feature&lt;br /&gt;
requests are welcome too.&lt;br /&gt;
&lt;br /&gt;
== How well tested is Bio.PDB? ==&lt;br /&gt;
&lt;br /&gt;
Pretty well, actually. Bio.PDB has been extensively tested on nearly&lt;br /&gt;
5500 structures from the PDB - all structures seemed to be parsed&lt;br /&gt;
correctly. More details can be found in the Bio.PDB Bioinformatics&lt;br /&gt;
article. Bio.PDB has been used/is being used in many research projects&lt;br /&gt;
as a reliable tool. In fact, I'm using Bio.PDB almost daily for research&lt;br /&gt;
purposes and continue working on improving it and adding new features.&lt;br /&gt;
&lt;br /&gt;
== How fast is it? ==&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;PDBParser&amp;lt;/code&amp;gt; performance was tested on about 800 structures&lt;br /&gt;
(each belonging to a unique SCOP superfamily). This takes about 20&lt;br /&gt;
minutes, or on average 1.5 seconds per structure. Parsing the structure&lt;br /&gt;
of the large ribosomal subunit (1FKK), which contains about 64000&lt;br /&gt;
atoms, takes 10 seconds on a 1000 MHz PC. In short: it's more than&lt;br /&gt;
fast enough for many applications.&lt;br /&gt;
&lt;br /&gt;
== Why should I use Bio.PDB? ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB might be exactly what you want, and then again it might not.&lt;br /&gt;
If you are interested in data mining the PDB header, you might want&lt;br /&gt;
to look elsewhere because there is only limited support for this.&lt;br /&gt;
If you look for a powerful, complete data structure to access the&lt;br /&gt;
atomic data Bio.PDB is probably for you. &lt;br /&gt;
&lt;br /&gt;
== Usage ==&lt;br /&gt;
&lt;br /&gt;
=== General questions ===&lt;br /&gt;
&lt;br /&gt;
==== Importing Bio.PDB ====&lt;br /&gt;
&lt;br /&gt;
That's simple:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
from Bio.PDB import *&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Is there support for molecular graphics? ====&lt;br /&gt;
&lt;br /&gt;
Not directly, mostly since there are quite a few Python based/Python&lt;br /&gt;
aware solutions already, that can potentially be used with Bio.PDB.&lt;br /&gt;
My choice is Pymol, BTW (I've used this successfully with Bio.PDB,&lt;br /&gt;
and there will probably be specific PyMol modules in Bio.PDB soon/some&lt;br /&gt;
day). Python based/aware molecular graphics solutions include:&lt;br /&gt;
&lt;br /&gt;
* PyMol: http://pymol.sourceforge.net/&lt;br /&gt;
* Chimera: http://www.cgl.ucsf.edu/chimera/&lt;br /&gt;
* PMV: http://www.scripps.edu/ sanner/python/&lt;br /&gt;
* Coot: http://www.ysbl.york.ac.uk/ emsley/coot/&lt;br /&gt;
* CCP4mg: http://www.ysbl.york.ac.uk/ lizp/molgraphics.html&lt;br /&gt;
* mmLib: http://pymmlib.sourceforge.net/ &lt;br /&gt;
* VMD: http://www.ks.uiuc.edu/Research/vmd/&lt;br /&gt;
* MMTK: http://starship.python.net/crew/hinsen/MMTK/&lt;br /&gt;
&lt;br /&gt;
I'd be crazy to write another molecular graphics application (been&lt;br /&gt;
there - done that, actually :-).&lt;br /&gt;
&lt;br /&gt;
=== Input/output ===&lt;br /&gt;
&lt;br /&gt;
==== How do I create a structure object from a PDB file? ====&lt;br /&gt;
&lt;br /&gt;
First, create a &amp;lt;code&amp;gt;PDBParser&amp;lt;/code&amp;gt; object:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
parser = PDBParser()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then, create a structure object from a PDB file in the following way&lt;br /&gt;
(the PDB file in this case is called '1FAT.pdb', 'PHA-L' is a user&lt;br /&gt;
defined name for the structure):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
structure = parser.get_structure('PHA-L', '1FAT.pdb')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I create a structure object from an mmCIF file? ====&lt;br /&gt;
&lt;br /&gt;
Similarly to the case the case of PDB files, first create an &amp;lt;code&amp;gt;MMCIFParser&amp;lt;/code&amp;gt;&lt;br /&gt;
object:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
parser = MMCIFParser()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Then use this parser to create a structure object from the mmCIF file:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
structure = parser.get_structure('PHA-L', '1FAT.cif')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== ...and what about the new PDB XML format? ====&lt;br /&gt;
&lt;br /&gt;
That's not yet supported, but I'm definitely planning to support that&lt;br /&gt;
in the future (it's not a lot of work). Contact me if you need this,&lt;br /&gt;
it might encourage me :-).&lt;br /&gt;
&lt;br /&gt;
==== I'd like to have some more low level access to an mmCIF file... ====&lt;br /&gt;
&lt;br /&gt;
You got it. You can create a python dictionary that maps all mmCIF&lt;br /&gt;
tags in an mmCIF file to their values. If there are multiple values&lt;br /&gt;
(like in the case of tag &amp;lt;code&amp;gt;_atom_site.Cartn_y&amp;lt;/code&amp;gt;, which holds&lt;br /&gt;
the ''y'' coordinates of all atoms), the tag is mapped to a list of values.&lt;br /&gt;
The dictionary is created from the mmCIF file as follows:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
mmcif_dict = MMCIF2Dict('1FAT.cif')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example: get the solvent content from an mmCIF file:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
sc = mmcif_dict['_exptl_crystal.density_percent_sol']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example: get the list of the ''y'' coordinates of all atoms&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
y_list = mmcif_dict['_atom_site.Cartn_y']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Can I access the header information? ====&lt;br /&gt;
&lt;br /&gt;
Thanks to Christian Rother you can access some information from the&lt;br /&gt;
PDB header. Note however that many PDB files contain headers with&lt;br /&gt;
incomplete or erroneous information. Many of the errors have been&lt;br /&gt;
fixed in the equivalent mmCIF files.&lt;br /&gt;
'''Hence, if you are interested in the header information, it is a good idea to extract information from mmCIF files using the &amp;lt;code&amp;gt;MMCIF2Dict&amp;lt;/code&amp;gt; tool described above, instead of parsing the PDB header.''' &lt;br /&gt;
&lt;br /&gt;
Now that is clarified, let's return to parsing the PDB header. The&lt;br /&gt;
structure object has an attribute called &amp;lt;code&amp;gt;header&amp;lt;/code&amp;gt; which is&lt;br /&gt;
a python dictionary that maps header records to their values.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
resolution = structure.header['resolution']&lt;br /&gt;
keywords = structure.header['keywords']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The available keys are &amp;lt;code&amp;gt;name&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;head&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;deposition_date&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;release_date&amp;lt;/code&amp;gt;,&lt;br /&gt;
&amp;lt;code&amp;gt;structure_method&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;resolution&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;structure_reference&amp;lt;/code&amp;gt; (maps to&lt;br /&gt;
a list of references), &amp;lt;code&amp;gt;journal_reference&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;author&amp;lt;/code&amp;gt; and&lt;br /&gt;
&amp;lt;code&amp;gt;compound&amp;lt;/code&amp;gt; (maps to a dictionary with various information about&lt;br /&gt;
the crystallized compound).&lt;br /&gt;
&lt;br /&gt;
The dictionary can also be created without creating a &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt;&lt;br /&gt;
object, ie. directly from the PDB file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
handle = open(filename,'r')&lt;br /&gt;
header_dict = parse_pdb_header(handle)&lt;br /&gt;
handle.close()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Can I use Bio.PDB with NMR structures (ie. with more than one model)? ====&lt;br /&gt;
&lt;br /&gt;
Sure. Many PDB parsers assume that there is only one model, making&lt;br /&gt;
them all but useless for NMR structures. The design of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt;&lt;br /&gt;
object makes it easy to handle PDB files with more than one model&lt;br /&gt;
(see section [[#The Structure object]]). &lt;br /&gt;
&lt;br /&gt;
==== How do I download structures from the PDB? ====&lt;br /&gt;
&lt;br /&gt;
This can be done using the &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object, using the &amp;lt;code&amp;gt;retrieve_pdb_file&amp;lt;/code&amp;gt;&lt;br /&gt;
method. The argument for this method is the PDB identifier of the&lt;br /&gt;
structure.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
pdbl = PDBList()&lt;br /&gt;
pdbl.retrieve_pdb_file('1FAT')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; class can also be used as a command-line tool: &lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
python PDBList.py 1fat&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
The downloaded file will be called &amp;lt;code&amp;gt;pdb1fat.ent&amp;lt;/code&amp;gt; and stored&lt;br /&gt;
in the current working directory. Note that the &amp;lt;code&amp;gt;retrieve_pdb_file&amp;lt;/code&amp;gt;&lt;br /&gt;
method also has an optional argument &amp;lt;code&amp;gt;pdir&amp;lt;/code&amp;gt; that specifies&lt;br /&gt;
a specific directory in which to store the downloaded PDB files. &lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;retrieve_pdb_file&amp;lt;/code&amp;gt; method also has some options for specifying&lt;br /&gt;
the compression format used for the download, and the program used&lt;br /&gt;
for local decompression (default &amp;lt;code&amp;gt;.Z&amp;lt;/code&amp;gt; format and &amp;lt;code&amp;gt;gunzip&amp;lt;/code&amp;gt;).&lt;br /&gt;
In addition, the PDB ftp site can be specified upon creation of the&lt;br /&gt;
&amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object. By default, the server of the Worldwide Protein Data Bank (\url{ftp://ftp.wwpdb.org/pub/pdb/data/structures/divided/pdb/})&lt;br /&gt;
is used. See the API documentation for more details. Thanks again&lt;br /&gt;
to Kristian Rother for donating this module.&lt;br /&gt;
&lt;br /&gt;
==== How do I download the entire PDB? ====&lt;br /&gt;
&lt;br /&gt;
The following commands will store all PDB files in the &amp;lt;code&amp;gt;/data/pdb&amp;lt;/code&amp;gt;&lt;br /&gt;
directory: &lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
python PDBList.py all /data/pdb&lt;br /&gt;
python PDBList.py all /data/pdb -d&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
The API method for this is called &amp;lt;code&amp;gt;download_entire_pdb&amp;lt;/code&amp;gt;.&lt;br /&gt;
Adding the &amp;lt;code&amp;gt;-d&amp;lt;/code&amp;gt; option will store all files in the same directory.&lt;br /&gt;
Otherwise, they are sorted into PDB-style subdirectories according&lt;br /&gt;
to their PDB ID's. Depending on the traffic, a complete download will&lt;br /&gt;
take 2-4 days.&lt;br /&gt;
&lt;br /&gt;
==== How do I keep a local copy of the PDB up-to-date? ====&lt;br /&gt;
&lt;br /&gt;
This can also be done using the &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object. One simply&lt;br /&gt;
creates a &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object (specifying the directory where&lt;br /&gt;
the local copy of the PDB is present) and calls the &amp;lt;code&amp;gt;update_pdb&amp;lt;/code&amp;gt;&lt;br /&gt;
method:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
pl = PDBList(pdb='/data/pdb')&lt;br /&gt;
pl.update_pdb()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
One can of course make a weekly cronjob out of this to keep&lt;br /&gt;
the local copy automatically up-to-date. The PDB ftp site can also&lt;br /&gt;
be specified (see API documentation).&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; has some additional methods that can be of use. The&lt;br /&gt;
&amp;lt;code&amp;gt;get_all_obsolete&amp;lt;/code&amp;gt; method can be used to get a list of all&lt;br /&gt;
obsolete PDB entries. The &amp;lt;code&amp;gt;changed_this_week&amp;lt;/code&amp;gt; method can&lt;br /&gt;
be used to obtain the entries that were added, modified or obsoleted&lt;br /&gt;
during the current week. For more info on the possibilities of &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt;,&lt;br /&gt;
see the API documentation.&lt;br /&gt;
&lt;br /&gt;
==== What about all those buggy PDB files? ====&lt;br /&gt;
&lt;br /&gt;
It is well known that many PDB files contain semantic errors (I'm&lt;br /&gt;
not talking about the structures themselves know, but their representation&lt;br /&gt;
in PDB files). Bio.PDB tries to handle this in two ways. The PDBParser&lt;br /&gt;
object can behave in two ways: a restrictive way and a permissive&lt;br /&gt;
way (THIS IS NOW THE DEFAULT). The restrictive way used to be the&lt;br /&gt;
default, but people seemed to think that Bio.PDB 'crashed' due to&lt;br /&gt;
a bug (hah!), so I changed it. If you ever encounter a real bug, please&lt;br /&gt;
tell me immediately!&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Permissive parser&lt;br /&gt;
parser = PDBParser(PERMISSIVE=1)&lt;br /&gt;
parser = PDBParser()  # The same (default)&lt;br /&gt;
# Strict parser&lt;br /&gt;
strict_parser = PDBParser(PERMISSIVE=0)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
In the permissive state (DEFAULT), PDB files that obviously contain&lt;br /&gt;
errors are 'corrected' (i.e. some residues or atoms are left out).&lt;br /&gt;
These errors include:&lt;br /&gt;
* Multiple residues with the same identifier&lt;br /&gt;
* Multiple atoms with the same identifier (taking into account the altloc identifier)&lt;br /&gt;
These errors indicate real problems in the PDB file (for details see&lt;br /&gt;
the Bioinformatics article). In the restrictive state, PDB files with&lt;br /&gt;
errors cause an exception to occur. This is useful to find errors&lt;br /&gt;
in PDB files.&lt;br /&gt;
&lt;br /&gt;
Some errors however are automatically corrected. Normally each disordered&lt;br /&gt;
atom should have a non-blank altloc identifier. However, there are&lt;br /&gt;
many structures that do not follow this convention, and have a blank&lt;br /&gt;
and a non-blank identifier for two disordered positions of the same&lt;br /&gt;
atom. This is automatically interpreted in the right way.&lt;br /&gt;
&lt;br /&gt;
Sometimes a structure contains a list of residues belonging to chain&lt;br /&gt;
A, followed by residues belonging to chain B, and again followed by&lt;br /&gt;
residues belonging to chain A, i.e. the chains are 'broken'. This&lt;br /&gt;
is also correctly interpreted.&lt;br /&gt;
&lt;br /&gt;
==== Can I write PDB files? ====&lt;br /&gt;
&lt;br /&gt;
Use the PDBIO class for this. It's easy to write out specific parts&lt;br /&gt;
of a structure too, of course.&lt;br /&gt;
&lt;br /&gt;
Example: saving a structure&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
io = PDBIO()&lt;br /&gt;
io.set_structure(s)&lt;br /&gt;
io.save('out.pdb')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
If you want to write out a part of the structure, make use of the&lt;br /&gt;
&amp;lt;code&amp;gt;Select&amp;lt;/code&amp;gt; class (also in &amp;lt;code&amp;gt;PDBIO&amp;lt;/code&amp;gt;). Select has four methods:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
accept_model(model)&lt;br /&gt;
accept_chain(chain)&lt;br /&gt;
accept_residue(residue)&lt;br /&gt;
accept_atom(atom)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
By default, every method returns 1 (which means the model/chain/residue/atom&lt;br /&gt;
is included in the output). By subclassing &amp;lt;code&amp;gt;Select&amp;lt;/code&amp;gt; and returning&lt;br /&gt;
0 when appropriate you can exclude models, chains, etc. from the output.&lt;br /&gt;
Cumbersome maybe, but very powerful. The following code only writes&lt;br /&gt;
out glycine residues:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
class GlySelect(Select):&lt;br /&gt;
    def accept_residue(self, residue):&lt;br /&gt;
        if residue.get_name()=='GLY':&lt;br /&gt;
            return 1&lt;br /&gt;
        else:&lt;br /&gt;
            return 0&lt;br /&gt;
&lt;br /&gt;
io = PDBIO()&lt;br /&gt;
io.set_structure(s)&lt;br /&gt;
io.save('gly_only.pdb', GlySelect())&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
If this is all too complicated for you, the &amp;lt;code&amp;gt;Dice&amp;lt;/code&amp;gt; module contains&lt;br /&gt;
a handy &amp;lt;code&amp;gt;extract&amp;lt;/code&amp;gt; function that writes out all residues in&lt;br /&gt;
a chain between a start and end residue.&lt;br /&gt;
&lt;br /&gt;
==== Can I write mmCIF files? ====&lt;br /&gt;
&lt;br /&gt;
No, and I also don't have plans to add that functionality soon (or&lt;br /&gt;
ever - I don't need it at all, and it's a lot of work, plus no-one&lt;br /&gt;
has ever asked for it). People who want to add this can contact me.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== The Structure object ===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==== What's the overall layout of a Structure object? ====&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object follows the so-called '''SMCRA'''&lt;br /&gt;
(Structure/Model/Chain/Residue/Atom) architecture : &lt;br /&gt;
* A structure consists of models&lt;br /&gt;
* A model consists of chains&lt;br /&gt;
* A chain consists of residues&lt;br /&gt;
* A residue consists of atoms&lt;br /&gt;
This is the way many structural biologists/bioinformaticians think&lt;br /&gt;
about structure, and provides a simple but efficient way to deal with&lt;br /&gt;
structure. Additional stuff is essentially added when needed. A UML&lt;br /&gt;
diagram of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object (forgetting about the &amp;lt;code&amp;gt;Disordered&amp;lt;/code&amp;gt;&lt;br /&gt;
classes for now) is shown in the figure below.&lt;br /&gt;
&lt;br /&gt;
[[Image:Smcra.png|600px|left|frame|Diagram of SMCRA architecture of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object. Full lines with diamonds denote aggregation, full lines with arrows denote referencing, full lines with triangles denote inheritance and dashed lines with triangles denote interface realization.]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br clear=all&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I navigate through a Structure object? ====&lt;br /&gt;
&lt;br /&gt;
The following code iterates through all atoms of a structure:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
p = PDBParser()&lt;br /&gt;
structure = p.get_structure('X', 'pdb1fat.ent')&lt;br /&gt;
for model in structure:&lt;br /&gt;
    for chain in model:&lt;br /&gt;
        for residue in chain:&lt;br /&gt;
            for atom in residue:&lt;br /&gt;
                print atom&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
There are also some shortcuts:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Iterate over all atoms in a structure&lt;br /&gt;
for atom in structure.get_atoms():&lt;br /&gt;
    print atom&lt;br /&gt;
&lt;br /&gt;
# Iterate over all residues in a model&lt;br /&gt;
for residue in model.get_residues():&lt;br /&gt;
    print residue&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Structures, models, chains, residues and atoms are called '''Entities'''&lt;br /&gt;
in Biopython. You can always get a parent &amp;lt;code&amp;gt;Entity&amp;lt;/code&amp;gt; from a child&lt;br /&gt;
&amp;lt;code&amp;gt;Entity&amp;lt;/code&amp;gt;, e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
residue = atom.get_parent()&lt;br /&gt;
chain = residue.get_parent()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
You can also test whether an &amp;lt;code&amp;gt;Entity&amp;lt;/code&amp;gt; has a certain child using&lt;br /&gt;
the &amp;lt;code&amp;gt;has_id&amp;lt;/code&amp;gt; method.&lt;br /&gt;
&lt;br /&gt;
==== Can I do that a bit more conveniently? ====&lt;br /&gt;
&lt;br /&gt;
You can do things like:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
atoms = structure.get_atoms()&lt;br /&gt;
residues = structure.get_residues()&lt;br /&gt;
atoms = chain.get_atoms()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
You can also use the &amp;lt;code&amp;gt;Selection.unfold_entities&amp;lt;/code&amp;gt; function:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Get all residues from a structure&lt;br /&gt;
res_list = Selection.unfold_entities(structure, 'R')&lt;br /&gt;
# Get all atoms from a chain&lt;br /&gt;
atom_list = Selection.unfold_entities(chain, 'A')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Obviously, &amp;lt;code&amp;gt;A&amp;lt;/code&amp;gt;=atom, &amp;lt;code&amp;gt;R&amp;lt;/code&amp;gt;=residue, &amp;lt;code&amp;gt;C&amp;lt;/code&amp;gt;=chain, &amp;lt;code&amp;gt;M&amp;lt;/code&amp;gt;=model, &amp;lt;code&amp;gt;S&amp;lt;/code&amp;gt;=structure.&lt;br /&gt;
You can use this to go up in the hierarchy, e.g. to get a list of (unique) &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;Chain&amp;lt;/code&amp;gt; parents from a list of&lt;br /&gt;
&amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;s:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
residue_list = Selection.unfold_entities(atom_list, 'R')&lt;br /&gt;
chain_list = Selection.unfold_entities(atom_list, 'C')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
For more info, see the API documentation.&lt;br /&gt;
&lt;br /&gt;
==== How do I extract a specific &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;Chain&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;Model&amp;lt;/code&amp;gt; from a &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt;? ====&lt;br /&gt;
&lt;br /&gt;
Easy. Here are some examples:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
model = structure[0]&lt;br /&gt;
chain = model['A']&lt;br /&gt;
residue = chain[100]&lt;br /&gt;
atom = residue['CA']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Note that you can use a shortcut:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
atom = structure[0]['A'][100]['CA']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== What is a model id? ====&lt;br /&gt;
&lt;br /&gt;
The model id is an integer which denotes the rank of the model in&lt;br /&gt;
the PDB/mmCIF file. The model is starts at 0. Crystal structures generally&lt;br /&gt;
have only one model (with id 0), while NMR files usually have several&lt;br /&gt;
models.&lt;br /&gt;
&lt;br /&gt;
==== What is a chain id? ====&lt;br /&gt;
&lt;br /&gt;
The chain id is specified in the PDB/mmCIF file, and is a single character&lt;br /&gt;
(typically a letter). &lt;br /&gt;
&lt;br /&gt;
==== What is a residue id? ====&lt;br /&gt;
&lt;br /&gt;
This is a bit more complicated, due to the clumsy PDB format. A residue&lt;br /&gt;
id is a tuple with three elements:&lt;br /&gt;
* The '''hetero-flag''': this is &amp;lt;code&amp;gt;'H_'&amp;lt;/code&amp;gt; plus the name of the hetero-residue (e.g. &amp;lt;code&amp;gt;'H_GLC'&amp;lt;/code&amp;gt; in the case of a glucose molecule), or &amp;lt;code&amp;gt;'W'&amp;lt;/code&amp;gt; in the case of a water molecule.&lt;br /&gt;
* The '''sequence identifier''' in the chain, e.g. 100&lt;br /&gt;
* The '''insertion code''', e.g. &amp;lt;code&amp;gt;'A'&amp;lt;/code&amp;gt;. The insertion code is sometimes used to preserve a certain desirable residue numbering scheme. A Ser 80 insertion mutant (inserted e.g. between a Thr 80 and an Asn 81 residue) could e.g. have sequence identifiers and insertion codes as follows: Thr 80 A, Ser 80 B, Asn 81. In this way the residue numbering scheme stays in tune with that of the wild type structure.&lt;br /&gt;
&lt;br /&gt;
The id of the above glucose residue would thus be &amp;lt;code&amp;gt;('H_GLC', 100, 'A')&amp;lt;/code&amp;gt;. If the hetero-flag and insertion code are blank, the sequence&lt;br /&gt;
identifier alone can be used:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Full id&lt;br /&gt;
residue = chain[(' ', 100, ' ')]&lt;br /&gt;
# Shortcut id&lt;br /&gt;
residue = chain[100]&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
The reason for the hetero-flag is that many, many PDB files use the same sequence identifier for an amino acid and a hetero-residue or a water, which would create obvious problems if the hetero-flag was not used.&lt;br /&gt;
&lt;br /&gt;
==== What is an atom id? ====&lt;br /&gt;
&lt;br /&gt;
The atom id is simply the atom name (eg. &amp;lt;code&amp;gt;'CA'&amp;lt;/code&amp;gt;). In practice,&lt;br /&gt;
the atom name is created by stripping all spaces from the atom name&lt;br /&gt;
in the PDB file. &lt;br /&gt;
&lt;br /&gt;
However, in PDB files, a space can be part of an atom name. Often,&lt;br /&gt;
calcium atoms are called &amp;lt;code&amp;gt;'CA..'&amp;lt;/code&amp;gt; in order to distinguish them&lt;br /&gt;
from C&amp;amp;alpha; atoms (which are called &amp;lt;code&amp;gt;'.CA.'&amp;lt;/code&amp;gt;). In cases&lt;br /&gt;
were stripping the spaces would create problems (ie. two atoms called&lt;br /&gt;
&amp;lt;code&amp;gt;'CA'&amp;lt;/code&amp;gt; in the same residue) the spaces are kept.&lt;br /&gt;
&lt;br /&gt;
==== How is disorder handled? ====&lt;br /&gt;
&lt;br /&gt;
This is one of the strong points of Bio.PDB. It can handle both disordered&lt;br /&gt;
atoms and point mutations (ie. a Gly and an Ala residue in the same&lt;br /&gt;
position). &lt;br /&gt;
&lt;br /&gt;
Disorder should be dealt with from two points of view: the atom and&lt;br /&gt;
the residue points of view. In general, I have tried to encapsulate&lt;br /&gt;
all the complexity that arises from disorder. If you just want to&lt;br /&gt;
loop over all C&amp;amp;alpha; atoms, you do not care that some residues&lt;br /&gt;
have a disordered side chain. On the other hand it should also be&lt;br /&gt;
possible to represent disorder completely in the data structure. Therefore,&lt;br /&gt;
disordered atoms or residues are stored in special objects that behave&lt;br /&gt;
as if there is no disorder. This is done by only representing a subset&lt;br /&gt;
of the disordered atoms or residues. Which subset is picked (e.g.&lt;br /&gt;
which of the two disordered OG side chain atom positions of a Ser&lt;br /&gt;
residue is used) can be specified by the user.&lt;br /&gt;
&lt;br /&gt;
'''Disordered atom positions''' are represented by ordinary &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;&lt;br /&gt;
objects, but all &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects that represent the same physical&lt;br /&gt;
atom are stored in a &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; object (see section [[#The Structure object]]).&lt;br /&gt;
Each &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object in a &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; object can&lt;br /&gt;
be uniquely indexed using its altloc specifier. The &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt;&lt;br /&gt;
object forwards all uncaught method calls to the selected Atom object,&lt;br /&gt;
by default the one that represents the atom with the highest&lt;br /&gt;
occupancy. The user can of course change the selected &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;&lt;br /&gt;
object, making use of its altloc specifier. In this way atom disorder&lt;br /&gt;
is represented correctly without much additional complexity. In other&lt;br /&gt;
words, if you are not interested in atom disorder, you will not be&lt;br /&gt;
bothered by it.&lt;br /&gt;
&lt;br /&gt;
Each disordered atom has a characteristic altloc identifier. You can&lt;br /&gt;
specify that a &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; object should behave like&lt;br /&gt;
the &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object associated with a specific altloc identifier:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
atom.disordered_select('A')  # select altloc A atom&lt;br /&gt;
atom.disordered_select('B')  # select altloc B atom &lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
A special case arises when disorder is due to '''point mutations''',&lt;br /&gt;
i.e. when two or more point mutants of a polypeptide are present in&lt;br /&gt;
the crystal. An example of this can be found in PDB structure 1EN2.&lt;br /&gt;
&lt;br /&gt;
Since these residues belong to a different residue type (e.g. let's&lt;br /&gt;
say Ser 60 and Cys 60) they should not be stored in a single &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt;&lt;br /&gt;
object as in the common case. In this case, each residue is represented&lt;br /&gt;
by one &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object, and both &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; objects&lt;br /&gt;
are stored in a single &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt; object (see section [[#The Structure object]]).&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt; object forwards all uncaught&lt;br /&gt;
methods to the selected &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object (by default the last&lt;br /&gt;
&amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object added), and thus behaves like an ordinary&lt;br /&gt;
residue. Each &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object in a &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt;&lt;br /&gt;
object can be uniquely identified by its residue name. In the above&lt;br /&gt;
example, residue Ser 60 would have id 'SER' in the &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt;&lt;br /&gt;
object, while residue Cys 60 would have id 'CYS'. The user can select&lt;br /&gt;
the active &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object in a &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt;&lt;br /&gt;
object via this id.&lt;br /&gt;
&lt;br /&gt;
Example: suppose that a chain has a point mutation at position 10,&lt;br /&gt;
consisting of a Ser and a Cys residue. Make sure that residue 10 of&lt;br /&gt;
this chain behaves as the Cys residue.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
residue = chain[10]&lt;br /&gt;
residue.disordered_select('CYS')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
In addition, you can get a list of all &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects (ie.&lt;br /&gt;
all &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; objects are 'unpacked' to their individual&lt;br /&gt;
&amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects) using the &amp;lt;code&amp;gt;get_unpacked_list&amp;lt;/code&amp;gt; method&lt;br /&gt;
of a (&amp;lt;code&amp;gt;Disordered&amp;lt;/code&amp;gt;)&amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object.&lt;br /&gt;
&lt;br /&gt;
==== Can I sort residues in a chain somehow? ====&lt;br /&gt;
&lt;br /&gt;
Yes, kinda, but I'm waiting for a request for this feature to finish&lt;br /&gt;
it :-).&lt;br /&gt;
&lt;br /&gt;
==== How are ligands and solvent handled? ====&lt;br /&gt;
&lt;br /&gt;
See 'What is a residue id?'.&lt;br /&gt;
&lt;br /&gt;
==== What about B factors? ====&lt;br /&gt;
&lt;br /&gt;
Well, yes! Bio.PDB supports isotropic and anisotropic B factors, and&lt;br /&gt;
also deals with standard deviations of anisotropic B factor if present&lt;br /&gt;
(see the section [[#Analysis]]).&lt;br /&gt;
&lt;br /&gt;
==== What about standard deviation of atomic positions? ====&lt;br /&gt;
&lt;br /&gt;
Yup, supported. See the section [[#Analysis]].&lt;br /&gt;
&lt;br /&gt;
==== I think the SMCRA data structure is not flexible/sexy/whatever enough... ====&lt;br /&gt;
&lt;br /&gt;
Sure, sure. Everybody is always coming up with (mostly vaporware or&lt;br /&gt;
partly implemented) data structures that handle all possible situations&lt;br /&gt;
and are extensible in all thinkable (and unthinkable) ways. The prosaic&lt;br /&gt;
truth however is that 99.9\% of people using (and I mean really using!)&lt;br /&gt;
crystal structures think in terms of models, chains, residues and&lt;br /&gt;
atoms. The philosophy of Bio.PDB is to provide a reasonably fast,&lt;br /&gt;
clean, simple, but complete data structure to access structure data.&lt;br /&gt;
The proof of the pudding is in the eating.&lt;br /&gt;
&lt;br /&gt;
Moreover, it is quite easy to build more specialised data structures&lt;br /&gt;
on top of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; class (eg. there's a &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt;&lt;br /&gt;
class). On the other hand, the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object is built&lt;br /&gt;
using a Parser/Consumer approach (called &amp;lt;code&amp;gt;PDBParser&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;MMCIFParser&amp;lt;/code&amp;gt;&lt;br /&gt;
and &amp;lt;code&amp;gt;StructureBuilder&amp;lt;/code&amp;gt;, respectively). One can easily reuse&lt;br /&gt;
the PDB/mmCIF parsers by implementing a specialised &amp;lt;code&amp;gt;StructureBuilder&amp;lt;/code&amp;gt;&lt;br /&gt;
class. It is of course also trivial to add support for new file formats&lt;br /&gt;
by writing new parsers.&lt;br /&gt;
&lt;br /&gt;
=== Analysis ===&lt;br /&gt;
&lt;br /&gt;
==== How do I extract information from an &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object? ====&lt;br /&gt;
&lt;br /&gt;
Using the following methods:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
a.get_name()           # atom name (spaces stripped, e.g. 'CA')&lt;br /&gt;
a.get_id()             # id (equals atom name)&lt;br /&gt;
a.get_coord()          # atomic coordinates&lt;br /&gt;
a.get_vector()         # atomic coordinates as Vector object&lt;br /&gt;
a.get_bfactor()        # isotropic B factor&lt;br /&gt;
a.get_occupancy()      # occupancy&lt;br /&gt;
a.get_altloc()         # alternative location specifier&lt;br /&gt;
a.get_sigatm()         # std. dev. of atomic parameters&lt;br /&gt;
a.get_siguij()         # std. dev. of anisotropic B factor&lt;br /&gt;
a.get_anisou()         # anisotropic B factor&lt;br /&gt;
a.get_fullname()       # atom name (with spaces, e.g. '.CA.')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I extract information from a &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object? ====&lt;br /&gt;
&lt;br /&gt;
Using the following methods:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
r.get_resname()         # return the residue name (eg. 'GLY')&lt;br /&gt;
r.is_disordered()       # 1 if the residue has disordered atoms&lt;br /&gt;
r.get_segid()           # return the SEGID&lt;br /&gt;
r.has_id(name)          # test if a residue has a certain atom&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I measure distances? ====&lt;br /&gt;
&lt;br /&gt;
That's simple: the minus operator for atoms has been overloaded to&lt;br /&gt;
return the distance between two atoms. &lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Get some atoms&lt;br /&gt;
ca1 = residue1['CA']&lt;br /&gt;
ca2 = residue2['CA']&lt;br /&gt;
# Simply subtract the atoms to get their distance&lt;br /&gt;
distance = ca1-ca2&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I measure angles? ====&lt;br /&gt;
&lt;br /&gt;
This can easily be done via the vector representation of the atomic coordinates, and the &amp;lt;code&amp;gt;calc_angle&amp;lt;/code&amp;gt; function from the &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; module:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
vector1 = atom1.get_vector()&lt;br /&gt;
vector2 = atom2.get_vector()&lt;br /&gt;
vector3 = atom3.get_vector()&lt;br /&gt;
angle = calc_angle(vector1, vector2, vector3)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I measure torsion angles? ====&lt;br /&gt;
&lt;br /&gt;
Again, this can easily be done via the vector representation of the&lt;br /&gt;
atomic coordinates, this time using the &amp;lt;code&amp;gt;calc_dihedral&amp;lt;/code&amp;gt; function&lt;br /&gt;
from the &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; module:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
vector1 = atom1.get_vector()&lt;br /&gt;
vector2 = atom2.get_vector()&lt;br /&gt;
vector3 = atom3.get_vector()&lt;br /&gt;
vector4 = atom4.get_vector()&lt;br /&gt;
angle = calc_dihedral(vector1, vector2, vector3, vector4)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I determine atom-atom contacts? ====&lt;br /&gt;
&lt;br /&gt;
Use &amp;lt;code&amp;gt;NeighborSearch&amp;lt;/code&amp;gt;. This uses a KD tree data structure coded&lt;br /&gt;
in C behind the screens, so it's pretty darn fast (see &amp;lt;code&amp;gt;Bio.KDTree&amp;lt;/code&amp;gt;).&lt;br /&gt;
&lt;br /&gt;
==== How do I extract polypeptides from a &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object? ====&lt;br /&gt;
&lt;br /&gt;
Use &amp;lt;code&amp;gt;PolypeptideBuilder&amp;lt;/code&amp;gt;. You can use the resulting &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt;&lt;br /&gt;
object to get the sequence as a &amp;lt;code&amp;gt;Seq&amp;lt;/code&amp;gt; object or to get a list&lt;br /&gt;
of C&amp;amp;alpha; atoms as well. Polypeptides can be built using a C-N&lt;br /&gt;
or a C&amp;amp;alpha;-C&amp;amp;alpha; distance criterion.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Using C-N &lt;br /&gt;
ppb=PPBuilder()&lt;br /&gt;
for pp in ppb.build_peptides(structure): &lt;br /&gt;
    print pp.get_sequence()&lt;br /&gt;
&lt;br /&gt;
# Using CA-CA&lt;br /&gt;
ppb=CaPPBuilder()&lt;br /&gt;
for pp in ppb.build_peptides(structure): &lt;br /&gt;
    print pp.get_sequence()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that in the above case only model 0 of the structure is considered&lt;br /&gt;
by &amp;lt;code&amp;gt;PolypeptideBuilder&amp;lt;/code&amp;gt;. However, it is possible to use &amp;lt;code&amp;gt;PolypeptideBuilder&amp;lt;/code&amp;gt;&lt;br /&gt;
to build &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt; objects from &amp;lt;code&amp;gt;Model&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;Chain&amp;lt;/code&amp;gt;&lt;br /&gt;
objects as well.&lt;br /&gt;
&lt;br /&gt;
==== How do I get the sequence of a structure? ====&lt;br /&gt;
&lt;br /&gt;
The first thing to do is to extract all polypeptides from the structure&lt;br /&gt;
(see previous entry). The sequence of each polypeptide can then easily&lt;br /&gt;
be obtained from the &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt; objects. The sequence is&lt;br /&gt;
represented as a Biopython &amp;lt;code&amp;gt;Seq&amp;lt;/code&amp;gt; object, and its alphabet is&lt;br /&gt;
defined by a &amp;lt;code&amp;gt;ProteinAlphabet&amp;lt;/code&amp;gt; object.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; seq = polypeptide.get_sequence()&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print seq&lt;br /&gt;
Seq('SNVVE...', &amp;lt;class Bio.Alphabet.ProteinAlphabet&amp;gt;)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I determine secondary structure? ====&lt;br /&gt;
&lt;br /&gt;
For this functionality, you need to install DSSP (and obtain a license&lt;br /&gt;
for it - free for academic use, see http://www.cmbi.kun.nl/gv/dssp/).&lt;br /&gt;
Then use the &amp;lt;code&amp;gt;DSSP&amp;lt;/code&amp;gt; class, which maps &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; objects&lt;br /&gt;
to their secondary structure (and accessible surface area). The DSSP&lt;br /&gt;
codes are listed in Table \ref{cap:DSSP-codes. Note that DSSP (the&lt;br /&gt;
program, and thus by consequence the class) cannot handle multiple&lt;br /&gt;
models!&lt;br /&gt;
&lt;br /&gt;
%&lt;br /&gt;
\begin{table&lt;br /&gt;
&lt;br /&gt;
==== \begin{tabular{|c|c|&lt;br /&gt;
\hline &lt;br /&gt;
Code&amp;amp;&lt;br /&gt;
Secondary structure\tabularnewline&lt;br /&gt;
\hline&lt;br /&gt;
\hline &lt;br /&gt;
H&amp;amp;&lt;br /&gt;
&amp;amp;alpha;-helix\tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
B&amp;amp;&lt;br /&gt;
Isolated &amp;amp;beta;-bridge residue\tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
E&amp;amp;&lt;br /&gt;
Strand \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
G&amp;amp;&lt;br /&gt;
3-10 helix \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
I&amp;amp;&lt;br /&gt;
$\Pi$-helix \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
T&amp;amp;&lt;br /&gt;
Turn\tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
S&amp;amp;&lt;br /&gt;
Bend \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
-&amp;amp;&lt;br /&gt;
Other\tabularnewline&lt;br /&gt;
\hline&lt;br /&gt;
\end{tabular&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
\caption{\label{cap:DSSP-codesDSSP codes in Bio.PDB.&lt;br /&gt;
\end{table&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==== How do I calculate the accessible surface area of a residue? ====&lt;br /&gt;
&lt;br /&gt;
Use the &amp;lt;code&amp;gt;DSSP&amp;lt;/code&amp;gt; class (see also previous entry). But see also&lt;br /&gt;
next entry.&lt;br /&gt;
&lt;br /&gt;
==== How do I calculate residue depth? ====&lt;br /&gt;
&lt;br /&gt;
Residue depth is the average distance of a residue's atoms from the&lt;br /&gt;
solvent accessible surface. It's a fairly new and very powerful parameterization&lt;br /&gt;
of solvent accessibility. For this functionality, you need to install&lt;br /&gt;
Michel Sanner's MSMS program (http://www.scripps.edu/pub/olson-web/people/sanner/html/msms_home.html).&lt;br /&gt;
Then use the &amp;lt;code&amp;gt;ResidueDepth&amp;lt;/code&amp;gt; class. This class behaves as a&lt;br /&gt;
dictionary which maps &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; objects to corresponding (residue&lt;br /&gt;
depth, C&amp;amp;alpha; depth) tuples. The C&amp;amp;alpha; depth is the distance&lt;br /&gt;
of a residue's C&amp;amp;alpha; atom to the solvent accessible surface. &lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
model = structure[0]&lt;br /&gt;
rd = ResidueDepth(model, pdb_file)&lt;br /&gt;
residue_depth, ca_depth = rd[some_residue]&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
You can also get access to the molecular surface itself (via the &amp;lt;code&amp;gt;get_surface&amp;lt;/code&amp;gt;&lt;br /&gt;
function), in the form of a Numeric python array with the surface points.&lt;br /&gt;
&lt;br /&gt;
==== How do I calculate Half Sphere Exposure? ====&lt;br /&gt;
&lt;br /&gt;
Half Sphere Exposure (HSE) is a new, 2D measure of solvent exposure.&lt;br /&gt;
Basically, it counts the number of C&amp;amp;alpha; atoms around a residue&lt;br /&gt;
in the direction of its side chain, and in the opposite direction&lt;br /&gt;
(within a radius of 13 Å). Despite its simplicity, it outperforms&lt;br /&gt;
many other measures of solvent exposure. An article describing this&lt;br /&gt;
novel 2D measure has been submitted.&lt;br /&gt;
&lt;br /&gt;
HSE comes in two flavors: HSE&amp;amp;alpha; and HSE&amp;amp;beta;. The former&lt;br /&gt;
only uses the C&amp;amp;alpha; atom positions, while the latter uses the&lt;br /&gt;
C&amp;amp;alpha; and C&amp;amp;beta; atom positions. The HSE measure is calculated&lt;br /&gt;
by the &amp;lt;code&amp;gt;HSExposure&amp;lt;/code&amp;gt; class, which can also calculate the contact&lt;br /&gt;
number. The latter class has methods which return dictionaries that&lt;br /&gt;
map a &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object to its corresponding HSE&amp;amp;alpha;, HSE&amp;amp;beta;&lt;br /&gt;
and contact number values.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
model = structure[0]&lt;br /&gt;
hse = HSExposure()&lt;br /&gt;
# Calculate HSEalpha&lt;br /&gt;
exp_ca = hse.calc_hs_exposure(model, option='CA3')&lt;br /&gt;
# Calculate HSEbeta&lt;br /&gt;
exp_cb = hse.calc_hs_exposure(model, option='CB')&lt;br /&gt;
# Calculate classical coordination number&lt;br /&gt;
exp_fs = hse.calc_fs_exposure(model)&lt;br /&gt;
# Print HSEalpha for a residue&lt;br /&gt;
print exp_ca[some_residue]&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I map the residues of two related structures onto each other? ====&lt;br /&gt;
&lt;br /&gt;
First, create an alignment file in FASTA format, then use the &amp;lt;code&amp;gt;StructureAlignment&amp;lt;/code&amp;gt;&lt;br /&gt;
class. This class can also be used for alignments with more than two&lt;br /&gt;
structures.&lt;br /&gt;
&lt;br /&gt;
==== How do I test if a Residue object is an amino acid? ====&lt;br /&gt;
&lt;br /&gt;
Use &amp;lt;code&amp;gt;is_aa(residue)&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
==== Can I do vector operations on atomic coordinates? ====&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects return a &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; object representation&lt;br /&gt;
of the coordinates with the &amp;lt;code&amp;gt;get_vector&amp;lt;/code&amp;gt; method. &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt;&lt;br /&gt;
implements the full set of 3D vector operations, matrix multiplication&lt;br /&gt;
(left and right) and some advanced rotation-related operations as&lt;br /&gt;
well. See also next question.&lt;br /&gt;
&lt;br /&gt;
==== How do I put a virtual C&amp;amp;beta; on a Gly residue? ====&lt;br /&gt;
&lt;br /&gt;
OK, I admit, this example is only present to show off the possibilities&lt;br /&gt;
of Bio.PDB's &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; module (though this code is actually&lt;br /&gt;
used in the &amp;lt;code&amp;gt;HSExposure&amp;lt;/code&amp;gt; module, which contains a novel way&lt;br /&gt;
to parametrize residue exposure - publication underway). Suppose that&lt;br /&gt;
you would like to find the position of a Gly residue's C&amp;amp;beta; atom,&lt;br /&gt;
if it had one. How would you do that? Well, rotating the N atom of&lt;br /&gt;
the Gly residue along the C&amp;amp;alpha;-C bond over -120 degrees roughly&lt;br /&gt;
puts it in the position of a virtual C&amp;amp;beta; atom. Here's how to&lt;br /&gt;
do it, making use of the &amp;lt;code&amp;gt;rotaxis&amp;lt;/code&amp;gt; method (which can be used&lt;br /&gt;
to construct a rotation around a certain axis) of the &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt;&lt;br /&gt;
module:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# get atom coordinates as vectors&lt;br /&gt;
n = residue['N'].get_vector() &lt;br /&gt;
c = residue['C'].get_vector() &lt;br /&gt;
ca = residue['CA'].get_vector()&lt;br /&gt;
# center at origin&lt;br /&gt;
n = n - ca&lt;br /&gt;
c = c - ca&lt;br /&gt;
# find rotation matrix that rotates n -120 degrees along the ca-c vector&lt;br /&gt;
rot = rotaxis(-pi{*120.0/180.0, c)&lt;br /&gt;
# apply rotation to ca-n vector&lt;br /&gt;
cb_at_origin = n.left_multiply(rot)&lt;br /&gt;
# put on top of ca atom&lt;br /&gt;
cb = cb_at_origin + ca&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
This example shows that it's possible to do some quite nontrivial&lt;br /&gt;
vector operations on atomic data, which can be quite useful. In addition&lt;br /&gt;
to all the usual vector operations (cross (use &amp;lt;code&amp;gt;**&amp;lt;/code&amp;gt;), and&lt;br /&gt;
dot (use &amp;lt;code&amp;gt;*&amp;lt;/code&amp;gt;) product, angle, norm, etc.) and the above mentioned&lt;br /&gt;
&amp;lt;code&amp;gt;rotaxis function, the &amp;lt;code&amp;gt;Vector module also has methods&lt;br /&gt;
to rotate (&amp;lt;code&amp;gt;rotmat&amp;lt;/code&amp;gt;) or reflect (&amp;lt;code&amp;gt;refmat&amp;lt;/code&amp;gt;) one vector&lt;br /&gt;
on top of another.&lt;br /&gt;
&lt;br /&gt;
=== Manipulating the structure ===&lt;br /&gt;
&lt;br /&gt;
==== How do I superimpose two structures? ====&lt;br /&gt;
&lt;br /&gt;
Surprisingly, this is done using the &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; object.&lt;br /&gt;
This object calculates the rotation and translation matrix that rotates&lt;br /&gt;
two lists of atoms on top of each other in such a way that their RMSD&lt;br /&gt;
is minimized. Of course, the two lists need to contain the same amount&lt;br /&gt;
of atoms. The &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; object can also apply the rotation/translation&lt;br /&gt;
to a list of atoms. The rotation and translation are stored as a tuple&lt;br /&gt;
in the &amp;lt;code&amp;gt;rotran&amp;lt;/code&amp;gt; attribute of the &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; object&lt;br /&gt;
(note that the rotation is right multiplying!). The RMSD is stored&lt;br /&gt;
in the &amp;lt;code&amp;gt;rmsd&amp;lt;/code&amp;gt; attribute.&lt;br /&gt;
&lt;br /&gt;
The algorithm used by &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; comes from \textit{Matrix&lt;br /&gt;
computations, 2nd ed. Golub, G. \&amp;amp; Van Loan (1989) and makes use&lt;br /&gt;
of singular value decomposition (this is implemented in the general&lt;br /&gt;
&amp;lt;code&amp;gt;Bio.SVDSuperimposer&amp;lt;/code&amp;gt; module).&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
sup = Superimposer()&lt;br /&gt;
# Specify the atom lists&lt;br /&gt;
# 'fixed' and 'moving' are lists of Atom objects&lt;br /&gt;
# The moving atoms will be put on the fixed atoms&lt;br /&gt;
sup.set_atoms(fixed, moving)&lt;br /&gt;
# Print rotation/translation/rmsd&lt;br /&gt;
print sup.rotran&lt;br /&gt;
print sup.rms &lt;br /&gt;
# Apply rotation/translation to the moving atoms&lt;br /&gt;
sup.apply(moving)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I superimpose two structures based on their active sites? ====&lt;br /&gt;
&lt;br /&gt;
Pretty easily. Use the active site atoms to calculate the rotation/translation&lt;br /&gt;
matrices (see above), and apply these to the whole molecule.&lt;br /&gt;
&lt;br /&gt;
==== Can I manipulate the atomic coordinates? ====&lt;br /&gt;
&lt;br /&gt;
Yes, using the &amp;lt;code&amp;gt;transform&amp;lt;/code&amp;gt; method of the &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object,&lt;br /&gt;
or directly using the &amp;lt;code&amp;gt;set_coord&amp;lt;/code&amp;gt; method.&lt;br /&gt;
&lt;br /&gt;
== Other Structural Bioinformatics modules ==&lt;br /&gt;
&lt;br /&gt;
==== Bio.SCOP ====&lt;br /&gt;
&lt;br /&gt;
See the main Biopython tutorial.&lt;br /&gt;
&lt;br /&gt;
==== Bio.FSSP ====&lt;br /&gt;
&lt;br /&gt;
No documentation available yet.&lt;br /&gt;
&lt;br /&gt;
== You haven't answered my question yet! ==&lt;br /&gt;
&lt;br /&gt;
Woah! It's late and I'm tired, and a glass of excellent ''Pedro Ximenez'' sherry is waiting for me. Just drop me a mail, and I'll answer&lt;br /&gt;
you in the morning (with a bit of luck...).&lt;br /&gt;
&lt;br /&gt;
== Contributors ==&lt;br /&gt;
&lt;br /&gt;
The main author/maintainer of Bio.PDB is:&lt;br /&gt;
&lt;br /&gt;
 Thomas Hamelryck&lt;br /&gt;
 Bioinformatics center&lt;br /&gt;
 Institute of Molecular Biology&lt;br /&gt;
 University of Copenhagen&lt;br /&gt;
 Universitetsparken 15, Bygning 10&lt;br /&gt;
 DK-2100 København Ø&lt;br /&gt;
 Denmark&lt;br /&gt;
 thamelry@binf.ku.dk&lt;br /&gt;
&lt;br /&gt;
Kristian Rother donated code to interact with the PDB database, and to parse the PDB&lt;br /&gt;
header. Indraneel Majumdar sent in some bug reports and assisted in&lt;br /&gt;
coding the &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt; module. Many thanks to Brad Chapman,&lt;br /&gt;
Jeffrey Chang, Andrew Dalke and Iddo Friedberg for suggestions, comments,&lt;br /&gt;
help and/or biting criticism :-).&lt;br /&gt;
&lt;br /&gt;
== Can I contribute? ==&lt;br /&gt;
&lt;br /&gt;
Yes, yes, yes! Just send an e-mail to [mailto:thamelry@binf.ku.dk Thomas Hamelryck] or to [mailto:biopython-dev@biopython.org the Biopython developers] if you have something useful to contribute! Eternal fame awaits!&lt;/div&gt;</summary>
		<author><name>Mdehoon</name></author>	</entry>

	<entry>
		<id>http://www.biopython.org/wiki/The_Biopython_Structural_Bioinformatics_FAQ</id>
		<title>The Biopython Structural Bioinformatics FAQ</title>
		<link rel="alternate" type="text/html" href="http://www.biopython.org/wiki/The_Biopython_Structural_Bioinformatics_FAQ"/>
				<updated>2013-03-26T08:32:36Z</updated>
		
		<summary type="html">&lt;p&gt;Mdehoon: /* How is disorder handled? */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Introduction ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB is a biopython module that focuses on working with crystal&lt;br /&gt;
structures of biological macromolecules. This document gives a fairly&lt;br /&gt;
complete overview of Bio.PDB.&lt;br /&gt;
&lt;br /&gt;
== Bio.PDB's installation ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB is automatically installed as part of Biopython. Biopython&lt;br /&gt;
can be obtained from http://www.biopython.org. It runs on many&lt;br /&gt;
platforms (Linux/Unix, windows, Mac,...).&lt;br /&gt;
&lt;br /&gt;
== Who's using Bio.PDB? ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB was used in the construction of DISEMBL, a web server that&lt;br /&gt;
predicts disordered regions in proteins (http://dis.embl.de/),&lt;br /&gt;
and COLUMBA, a website that provides annotated protein structures&lt;br /&gt;
(http://www.columba-db.de/). Bio.PDB has also been used to&lt;br /&gt;
perform a large scale search for active sites similarities between&lt;br /&gt;
protein structures in the PDB (see [http://dx.doi.org/10.1002/prot.10338 \textit{Proteins Struct. Func.&lt;br /&gt;
Gen., \textbf{2003, 51, 96-108]) , and to develop a new algorithm&lt;br /&gt;
that identifies linear secondary structure elements ([\emph{BMC Bioinformatics,&lt;br /&gt;
\textbf{2005, 6, 202 http://www.biomedcentral.com/1471-2105/6/202]).&lt;br /&gt;
&lt;br /&gt;
Judging from requests for features and information, Bio.PDB is also&lt;br /&gt;
used by several LPCs (Large Pharmaceutical Companies :-).&lt;br /&gt;
&lt;br /&gt;
== Is there a Bio.PDB reference? ==&lt;br /&gt;
&lt;br /&gt;
Yes, and I'd appreciate it if you would refer to Bio.PDB in publications&lt;br /&gt;
if you make use of it. The reference is:&lt;br /&gt;
&lt;br /&gt;
\begin{quote&lt;br /&gt;
Hamelryck, T., Manderick, B. (2003) PDB parser and structure class&lt;br /&gt;
implemented in Python. \textit{Bioinformatics, \textbf{19, 2308-2310. &lt;br /&gt;
\end{quote&lt;br /&gt;
The article can be freely downloaded via the Bioinformatics journal&lt;br /&gt;
website (http://www.binf.ku.dk/users/thamelry/references.html).&lt;br /&gt;
I welcome e-mails telling me what you are using Bio.PDB for. Feature&lt;br /&gt;
requests are welcome too.&lt;br /&gt;
&lt;br /&gt;
== How well tested is Bio.PDB? ==&lt;br /&gt;
&lt;br /&gt;
Pretty well, actually. Bio.PDB has been extensively tested on nearly&lt;br /&gt;
5500 structures from the PDB - all structures seemed to be parsed&lt;br /&gt;
correctly. More details can be found in the Bio.PDB Bioinformatics&lt;br /&gt;
article. Bio.PDB has been used/is being used in many research projects&lt;br /&gt;
as a reliable tool. In fact, I'm using Bio.PDB almost daily for research&lt;br /&gt;
purposes and continue working on improving it and adding new features.&lt;br /&gt;
&lt;br /&gt;
== How fast is it? ==&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;PDBParser&amp;lt;/code&amp;gt; performance was tested on about 800 structures&lt;br /&gt;
(each belonging to a unique SCOP superfamily). This takes about 20&lt;br /&gt;
minutes, or on average 1.5 seconds per structure. Parsing the structure&lt;br /&gt;
of the large ribosomal subunit (1FKK), which contains about 64000&lt;br /&gt;
atoms, takes 10 seconds on a 1000 MHz PC. In short: it's more than&lt;br /&gt;
fast enough for many applications.&lt;br /&gt;
&lt;br /&gt;
== Why should I use Bio.PDB? ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB might be exactly what you want, and then again it might not.&lt;br /&gt;
If you are interested in data mining the PDB header, you might want&lt;br /&gt;
to look elsewhere because there is only limited support for this.&lt;br /&gt;
If you look for a powerful, complete data structure to access the&lt;br /&gt;
atomic data Bio.PDB is probably for you. &lt;br /&gt;
&lt;br /&gt;
== Usage ==&lt;br /&gt;
&lt;br /&gt;
=== General questions ===&lt;br /&gt;
&lt;br /&gt;
==== Importing Bio.PDB ====&lt;br /&gt;
&lt;br /&gt;
That's simple:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
from Bio.PDB import *&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Is there support for molecular graphics? ====&lt;br /&gt;
&lt;br /&gt;
Not directly, mostly since there are quite a few Python based/Python&lt;br /&gt;
aware solutions already, that can potentially be used with Bio.PDB.&lt;br /&gt;
My choice is Pymol, BTW (I've used this successfully with Bio.PDB,&lt;br /&gt;
and there will probably be specific PyMol modules in Bio.PDB soon/some&lt;br /&gt;
day). Python based/aware molecular graphics solutions include:&lt;br /&gt;
&lt;br /&gt;
* PyMol: http://pymol.sourceforge.net/&lt;br /&gt;
* Chimera: http://www.cgl.ucsf.edu/chimera/&lt;br /&gt;
* PMV: http://www.scripps.edu/ sanner/python/&lt;br /&gt;
* Coot: http://www.ysbl.york.ac.uk/ emsley/coot/&lt;br /&gt;
* CCP4mg: http://www.ysbl.york.ac.uk/ lizp/molgraphics.html&lt;br /&gt;
* mmLib: http://pymmlib.sourceforge.net/ &lt;br /&gt;
* VMD: http://www.ks.uiuc.edu/Research/vmd/&lt;br /&gt;
* MMTK: http://starship.python.net/crew/hinsen/MMTK/&lt;br /&gt;
&lt;br /&gt;
I'd be crazy to write another molecular graphics application (been&lt;br /&gt;
there - done that, actually :-).&lt;br /&gt;
&lt;br /&gt;
=== Input/output ===&lt;br /&gt;
&lt;br /&gt;
==== How do I create a structure object from a PDB file? ====&lt;br /&gt;
&lt;br /&gt;
First, create a &amp;lt;code&amp;gt;PDBParser&amp;lt;/code&amp;gt; object:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
parser = PDBParser()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then, create a structure object from a PDB file in the following way&lt;br /&gt;
(the PDB file in this case is called '1FAT.pdb', 'PHA-L' is a user&lt;br /&gt;
defined name for the structure):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
structure = parser.get_structure('PHA-L', '1FAT.pdb')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I create a structure object from an mmCIF file? ====&lt;br /&gt;
&lt;br /&gt;
Similarly to the case the case of PDB files, first create an &amp;lt;code&amp;gt;MMCIFParser&amp;lt;/code&amp;gt;&lt;br /&gt;
object:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
parser = MMCIFParser()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Then use this parser to create a structure object from the mmCIF file:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
structure = parser.get_structure('PHA-L', '1FAT.cif')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== ...and what about the new PDB XML format? ====&lt;br /&gt;
&lt;br /&gt;
That's not yet supported, but I'm definitely planning to support that&lt;br /&gt;
in the future (it's not a lot of work). Contact me if you need this,&lt;br /&gt;
it might encourage me :-).&lt;br /&gt;
&lt;br /&gt;
==== I'd like to have some more low level access to an mmCIF file... ====&lt;br /&gt;
&lt;br /&gt;
You got it. You can create a python dictionary that maps all mmCIF&lt;br /&gt;
tags in an mmCIF file to their values. If there are multiple values&lt;br /&gt;
(like in the case of tag &amp;lt;code&amp;gt;_atom_site.Cartn_y&amp;lt;/code&amp;gt;, which holds&lt;br /&gt;
the ''y'' coordinates of all atoms), the tag is mapped to a list of values.&lt;br /&gt;
The dictionary is created from the mmCIF file as follows:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
mmcif_dict = MMCIF2Dict('1FAT.cif')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example: get the solvent content from an mmCIF file:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
sc = mmcif_dict['_exptl_crystal.density_percent_sol']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example: get the list of the ''y'' coordinates of all atoms&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
y_list = mmcif_dict['_atom_site.Cartn_y']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Can I access the header information? ====&lt;br /&gt;
&lt;br /&gt;
Thanks to Christian Rother you can access some information from the&lt;br /&gt;
PDB header. Note however that many PDB files contain headers with&lt;br /&gt;
incomplete or erroneous information. Many of the errors have been&lt;br /&gt;
fixed in the equivalent mmCIF files.&lt;br /&gt;
'''Hence, if you are interested in the header information, it is a good idea to extract information from mmCIF files using the &amp;lt;code&amp;gt;MMCIF2Dict&amp;lt;/code&amp;gt; tool described above, instead of parsing the PDB header.''' &lt;br /&gt;
&lt;br /&gt;
Now that is clarified, let's return to parsing the PDB header. The&lt;br /&gt;
structure object has an attribute called &amp;lt;code&amp;gt;header&amp;lt;/code&amp;gt; which is&lt;br /&gt;
a python dictionary that maps header records to their values.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
resolution = structure.header['resolution']&lt;br /&gt;
keywords = structure.header['keywords']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The available keys are &amp;lt;code&amp;gt;name&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;head&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;deposition_date&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;release_date&amp;lt;/code&amp;gt;,&lt;br /&gt;
&amp;lt;code&amp;gt;structure_method&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;resolution&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;structure_reference&amp;lt;/code&amp;gt; (maps to&lt;br /&gt;
a list of references), &amp;lt;code&amp;gt;journal_reference&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;author&amp;lt;/code&amp;gt; and&lt;br /&gt;
&amp;lt;code&amp;gt;compound&amp;lt;/code&amp;gt; (maps to a dictionary with various information about&lt;br /&gt;
the crystallized compound).&lt;br /&gt;
&lt;br /&gt;
The dictionary can also be created without creating a &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt;&lt;br /&gt;
object, ie. directly from the PDB file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
handle = open(filename,'r')&lt;br /&gt;
header_dict = parse_pdb_header(handle)&lt;br /&gt;
handle.close()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Can I use Bio.PDB with NMR structures (ie. with more than one model)? ====&lt;br /&gt;
&lt;br /&gt;
Sure. Many PDB parsers assume that there is only one model, making&lt;br /&gt;
them all but useless for NMR structures. The design of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt;&lt;br /&gt;
object makes it easy to handle PDB files with more than one model&lt;br /&gt;
(see section [[#The Structure object]]). &lt;br /&gt;
&lt;br /&gt;
==== How do I download structures from the PDB? ====&lt;br /&gt;
&lt;br /&gt;
This can be done using the &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object, using the &amp;lt;code&amp;gt;retrieve_pdb_file&amp;lt;/code&amp;gt;&lt;br /&gt;
method. The argument for this method is the PDB identifier of the&lt;br /&gt;
structure.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
pdbl = PDBList()&lt;br /&gt;
pdbl.retrieve_pdb_file('1FAT')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; class can also be used as a command-line tool: &lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
python PDBList.py 1fat&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
The downloaded file will be called &amp;lt;code&amp;gt;pdb1fat.ent&amp;lt;/code&amp;gt; and stored&lt;br /&gt;
in the current working directory. Note that the &amp;lt;code&amp;gt;retrieve_pdb_file&amp;lt;/code&amp;gt;&lt;br /&gt;
method also has an optional argument &amp;lt;code&amp;gt;pdir&amp;lt;/code&amp;gt; that specifies&lt;br /&gt;
a specific directory in which to store the downloaded PDB files. &lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;retrieve_pdb_file&amp;lt;/code&amp;gt; method also has some options for specifying&lt;br /&gt;
the compression format used for the download, and the program used&lt;br /&gt;
for local decompression (default &amp;lt;code&amp;gt;.Z&amp;lt;/code&amp;gt; format and &amp;lt;code&amp;gt;gunzip&amp;lt;/code&amp;gt;).&lt;br /&gt;
In addition, the PDB ftp site can be specified upon creation of the&lt;br /&gt;
&amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object. By default, the server of the Worldwide Protein Data Bank (\url{ftp://ftp.wwpdb.org/pub/pdb/data/structures/divided/pdb/})&lt;br /&gt;
is used. See the API documentation for more details. Thanks again&lt;br /&gt;
to Kristian Rother for donating this module.&lt;br /&gt;
&lt;br /&gt;
==== How do I download the entire PDB? ====&lt;br /&gt;
&lt;br /&gt;
The following commands will store all PDB files in the &amp;lt;code&amp;gt;/data/pdb&amp;lt;/code&amp;gt;&lt;br /&gt;
directory: &lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
python PDBList.py all /data/pdb&lt;br /&gt;
python PDBList.py all /data/pdb -d&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
The API method for this is called &amp;lt;code&amp;gt;download_entire_pdb&amp;lt;/code&amp;gt;.&lt;br /&gt;
Adding the &amp;lt;code&amp;gt;-d&amp;lt;/code&amp;gt; option will store all files in the same directory.&lt;br /&gt;
Otherwise, they are sorted into PDB-style subdirectories according&lt;br /&gt;
to their PDB ID's. Depending on the traffic, a complete download will&lt;br /&gt;
take 2-4 days.&lt;br /&gt;
&lt;br /&gt;
==== How do I keep a local copy of the PDB up-to-date? ====&lt;br /&gt;
&lt;br /&gt;
This can also be done using the &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object. One simply&lt;br /&gt;
creates a &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object (specifying the directory where&lt;br /&gt;
the local copy of the PDB is present) and calls the &amp;lt;code&amp;gt;update_pdb&amp;lt;/code&amp;gt;&lt;br /&gt;
method:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
pl = PDBList(pdb='/data/pdb')&lt;br /&gt;
pl.update_pdb()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
One can of course make a weekly cronjob out of this to keep&lt;br /&gt;
the local copy automatically up-to-date. The PDB ftp site can also&lt;br /&gt;
be specified (see API documentation).&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; has some additional methods that can be of use. The&lt;br /&gt;
&amp;lt;code&amp;gt;get_all_obsolete&amp;lt;/code&amp;gt; method can be used to get a list of all&lt;br /&gt;
obsolete PDB entries. The &amp;lt;code&amp;gt;changed_this_week&amp;lt;/code&amp;gt; method can&lt;br /&gt;
be used to obtain the entries that were added, modified or obsoleted&lt;br /&gt;
during the current week. For more info on the possibilities of &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt;,&lt;br /&gt;
see the API documentation.&lt;br /&gt;
&lt;br /&gt;
==== What about all those buggy PDB files? ====&lt;br /&gt;
&lt;br /&gt;
It is well known that many PDB files contain semantic errors (I'm&lt;br /&gt;
not talking about the structures themselves know, but their representation&lt;br /&gt;
in PDB files). Bio.PDB tries to handle this in two ways. The PDBParser&lt;br /&gt;
object can behave in two ways: a restrictive way and a permissive&lt;br /&gt;
way (THIS IS NOW THE DEFAULT). The restrictive way used to be the&lt;br /&gt;
default, but people seemed to think that Bio.PDB 'crashed' due to&lt;br /&gt;
a bug (hah!), so I changed it. If you ever encounter a real bug, please&lt;br /&gt;
tell me immediately!&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Permissive parser&lt;br /&gt;
parser = PDBParser(PERMISSIVE=1)&lt;br /&gt;
parser = PDBParser()  # The same (default)&lt;br /&gt;
# Strict parser&lt;br /&gt;
strict_parser = PDBParser(PERMISSIVE=0)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
In the permissive state (DEFAULT), PDB files that obviously contain&lt;br /&gt;
errors are 'corrected' (i.e. some residues or atoms are left out).&lt;br /&gt;
These errors include:&lt;br /&gt;
* Multiple residues with the same identifier&lt;br /&gt;
* Multiple atoms with the same identifier (taking into account the altloc identifier)&lt;br /&gt;
These errors indicate real problems in the PDB file (for details see&lt;br /&gt;
the Bioinformatics article). In the restrictive state, PDB files with&lt;br /&gt;
errors cause an exception to occur. This is useful to find errors&lt;br /&gt;
in PDB files.&lt;br /&gt;
&lt;br /&gt;
Some errors however are automatically corrected. Normally each disordered&lt;br /&gt;
atom should have a non-blank altloc identifier. However, there are&lt;br /&gt;
many structures that do not follow this convention, and have a blank&lt;br /&gt;
and a non-blank identifier for two disordered positions of the same&lt;br /&gt;
atom. This is automatically interpreted in the right way.&lt;br /&gt;
&lt;br /&gt;
Sometimes a structure contains a list of residues belonging to chain&lt;br /&gt;
A, followed by residues belonging to chain B, and again followed by&lt;br /&gt;
residues belonging to chain A, i.e. the chains are 'broken'. This&lt;br /&gt;
is also correctly interpreted.&lt;br /&gt;
&lt;br /&gt;
==== Can I write PDB files? ====&lt;br /&gt;
&lt;br /&gt;
Use the PDBIO class for this. It's easy to write out specific parts&lt;br /&gt;
of a structure too, of course.&lt;br /&gt;
&lt;br /&gt;
Example: saving a structure&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
io = PDBIO()&lt;br /&gt;
io.set_structure(s)&lt;br /&gt;
io.save('out.pdb')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
If you want to write out a part of the structure, make use of the&lt;br /&gt;
&amp;lt;code&amp;gt;Select&amp;lt;/code&amp;gt; class (also in &amp;lt;code&amp;gt;PDBIO&amp;lt;/code&amp;gt;). Select has four methods:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
accept_model(model)&lt;br /&gt;
accept_chain(chain)&lt;br /&gt;
accept_residue(residue)&lt;br /&gt;
accept_atom(atom)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
By default, every method returns 1 (which means the model/chain/residue/atom&lt;br /&gt;
is included in the output). By subclassing &amp;lt;code&amp;gt;Select&amp;lt;/code&amp;gt; and returning&lt;br /&gt;
0 when appropriate you can exclude models, chains, etc. from the output.&lt;br /&gt;
Cumbersome maybe, but very powerful. The following code only writes&lt;br /&gt;
out glycine residues:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
class GlySelect(Select):&lt;br /&gt;
    def accept_residue(self, residue):&lt;br /&gt;
        if residue.get_name()=='GLY':&lt;br /&gt;
            return 1&lt;br /&gt;
        else:&lt;br /&gt;
            return 0&lt;br /&gt;
&lt;br /&gt;
io = PDBIO()&lt;br /&gt;
io.set_structure(s)&lt;br /&gt;
io.save('gly_only.pdb', GlySelect())&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
If this is all too complicated for you, the &amp;lt;code&amp;gt;Dice&amp;lt;/code&amp;gt; module contains&lt;br /&gt;
a handy &amp;lt;code&amp;gt;extract&amp;lt;/code&amp;gt; function that writes out all residues in&lt;br /&gt;
a chain between a start and end residue.&lt;br /&gt;
&lt;br /&gt;
==== Can I write mmCIF files? ====&lt;br /&gt;
&lt;br /&gt;
No, and I also don't have plans to add that functionality soon (or&lt;br /&gt;
ever - I don't need it at all, and it's a lot of work, plus no-one&lt;br /&gt;
has ever asked for it). People who want to add this can contact me.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== The Structure object ===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==== What's the overall layout of a Structure object? ====&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object follows the so-called '''SMCRA'''&lt;br /&gt;
(Structure/Model/Chain/Residue/Atom) architecture : &lt;br /&gt;
* A structure consists of models&lt;br /&gt;
* A model consists of chains&lt;br /&gt;
* A chain consists of residues&lt;br /&gt;
* A residue consists of atoms&lt;br /&gt;
This is the way many structural biologists/bioinformaticians think&lt;br /&gt;
about structure, and provides a simple but efficient way to deal with&lt;br /&gt;
structure. Additional stuff is essentially added when needed. A UML&lt;br /&gt;
diagram of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object (forgetting about the &amp;lt;code&amp;gt;Disordered&amp;lt;/code&amp;gt;&lt;br /&gt;
classes for now) is shown in the figure below.&lt;br /&gt;
&lt;br /&gt;
[[Image:Smcra.png|600px|left|frame|Diagram of SMCRA architecture of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object. Full lines with diamonds denote aggregation, full lines with arrows denote referencing, full lines with triangles denote inheritance and dashed lines with triangles denote interface realization.]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br clear=all&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I navigate through a Structure object? ====&lt;br /&gt;
&lt;br /&gt;
The following code iterates through all atoms of a structure:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
p = PDBParser()&lt;br /&gt;
structure = p.get_structure('X', 'pdb1fat.ent')&lt;br /&gt;
for model in structure:&lt;br /&gt;
    for chain in model:&lt;br /&gt;
        for residue in chain:&lt;br /&gt;
            for atom in residue:&lt;br /&gt;
                print atom&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
There are also some shortcuts:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Iterate over all atoms in a structure&lt;br /&gt;
for atom in structure.get_atoms():&lt;br /&gt;
    print atom&lt;br /&gt;
&lt;br /&gt;
# Iterate over all residues in a model&lt;br /&gt;
for residue in model.get_residues():&lt;br /&gt;
    print residue&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Structures, models, chains, residues and atoms are called '''Entities'''&lt;br /&gt;
in Biopython. You can always get a parent &amp;lt;code&amp;gt;Entity&amp;lt;/code&amp;gt; from a child&lt;br /&gt;
&amp;lt;code&amp;gt;Entity&amp;lt;/code&amp;gt;, e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
residue = atom.get_parent()&lt;br /&gt;
chain = residue.get_parent()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
You can also test whether an &amp;lt;code&amp;gt;Entity&amp;lt;/code&amp;gt; has a certain child using&lt;br /&gt;
the &amp;lt;code&amp;gt;has_id&amp;lt;/code&amp;gt; method.&lt;br /&gt;
&lt;br /&gt;
==== Can I do that a bit more conveniently? ====&lt;br /&gt;
&lt;br /&gt;
You can do things like:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
atoms = structure.get_atoms()&lt;br /&gt;
residues = structure.get_residues()&lt;br /&gt;
atoms = chain.get_atoms()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
You can also use the &amp;lt;code&amp;gt;Selection.unfold_entities&amp;lt;/code&amp;gt; function:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Get all residues from a structure&lt;br /&gt;
res_list = Selection.unfold_entities(structure, 'R')&lt;br /&gt;
# Get all atoms from a chain&lt;br /&gt;
atom_list = Selection.unfold_entities(chain, 'A')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Obviously, &amp;lt;code&amp;gt;A&amp;lt;/code&amp;gt;=atom, &amp;lt;code&amp;gt;R&amp;lt;/code&amp;gt;=residue, &amp;lt;code&amp;gt;C&amp;lt;/code&amp;gt;=chain, &amp;lt;code&amp;gt;M&amp;lt;/code&amp;gt;=model, &amp;lt;code&amp;gt;S&amp;lt;/code&amp;gt;=structure.&lt;br /&gt;
You can use this to go up in the hierarchy, e.g. to get a list of (unique) &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;Chain&amp;lt;/code&amp;gt; parents from a list of&lt;br /&gt;
&amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;s:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
residue_list = Selection.unfold_entities(atom_list, 'R')&lt;br /&gt;
chain_list = Selection.unfold_entities(atom_list, 'C')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
For more info, see the API documentation.&lt;br /&gt;
&lt;br /&gt;
==== How do I extract a specific &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;Chain&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;Model&amp;lt;/code&amp;gt; from a &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt;? ====&lt;br /&gt;
&lt;br /&gt;
Easy. Here are some examples:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
model = structure[0]&lt;br /&gt;
chain = model['A']&lt;br /&gt;
residue = chain[100]&lt;br /&gt;
atom = residue['CA']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Note that you can use a shortcut:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
atom = structure[0]['A'][100]['CA']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== What is a model id? ====&lt;br /&gt;
&lt;br /&gt;
The model id is an integer which denotes the rank of the model in&lt;br /&gt;
the PDB/mmCIF file. The model is starts at 0. Crystal structures generally&lt;br /&gt;
have only one model (with id 0), while NMR files usually have several&lt;br /&gt;
models.&lt;br /&gt;
&lt;br /&gt;
==== What is a chain id? ====&lt;br /&gt;
&lt;br /&gt;
The chain id is specified in the PDB/mmCIF file, and is a single character&lt;br /&gt;
(typically a letter). &lt;br /&gt;
&lt;br /&gt;
==== What is a residue id? ====&lt;br /&gt;
&lt;br /&gt;
This is a bit more complicated, due to the clumsy PDB format. A residue&lt;br /&gt;
id is a tuple with three elements:&lt;br /&gt;
* The '''hetero-flag''': this is &amp;lt;code&amp;gt;'H_'&amp;lt;/code&amp;gt; plus the name of the hetero-residue (e.g. &amp;lt;code&amp;gt;'H_GLC'&amp;lt;/code&amp;gt; in the case of a glucose molecule), or &amp;lt;code&amp;gt;'W'&amp;lt;/code&amp;gt; in the case of a water molecule.&lt;br /&gt;
* The '''sequence identifier''' in the chain, e.g. 100&lt;br /&gt;
* The '''insertion code''', e.g. &amp;lt;code&amp;gt;'A'&amp;lt;/code&amp;gt;. The insertion code is sometimes used to preserve a certain desirable residue numbering scheme. A Ser 80 insertion mutant (inserted e.g. between a Thr 80 and an Asn 81 residue) could e.g. have sequence identifiers and insertion codes as follows: Thr 80 A, Ser 80 B, Asn 81. In this way the residue numbering scheme stays in tune with that of the wild type structure.&lt;br /&gt;
&lt;br /&gt;
The id of the above glucose residue would thus be &amp;lt;code&amp;gt;('H_GLC', 100, 'A')&amp;lt;/code&amp;gt;. If the hetero-flag and insertion code are blank, the sequence&lt;br /&gt;
identifier alone can be used:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Full id&lt;br /&gt;
residue = chain[(' ', 100, ' ')]&lt;br /&gt;
# Shortcut id&lt;br /&gt;
residue = chain[100]&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
The reason for the hetero-flag is that many, many PDB files use the same sequence identifier for an amino acid and a hetero-residue or a water, which would create obvious problems if the hetero-flag was not used.&lt;br /&gt;
&lt;br /&gt;
==== What is an atom id? ====&lt;br /&gt;
&lt;br /&gt;
The atom id is simply the atom name (eg. &amp;lt;code&amp;gt;'CA'&amp;lt;/code&amp;gt;). In practice,&lt;br /&gt;
the atom name is created by stripping all spaces from the atom name&lt;br /&gt;
in the PDB file. &lt;br /&gt;
&lt;br /&gt;
However, in PDB files, a space can be part of an atom name. Often,&lt;br /&gt;
calcium atoms are called &amp;lt;code&amp;gt;'CA..'&amp;lt;/code&amp;gt; in order to distinguish them&lt;br /&gt;
from C&amp;amp;alpha; atoms (which are called &amp;lt;code&amp;gt;'.CA.'&amp;lt;/code&amp;gt;). In cases&lt;br /&gt;
were stripping the spaces would create problems (ie. two atoms called&lt;br /&gt;
&amp;lt;code&amp;gt;'CA'&amp;lt;/code&amp;gt; in the same residue) the spaces are kept.&lt;br /&gt;
&lt;br /&gt;
==== How is disorder handled? ====&lt;br /&gt;
&lt;br /&gt;
This is one of the strong points of Bio.PDB. It can handle both disordered&lt;br /&gt;
atoms and point mutations (ie. a Gly and an Ala residue in the same&lt;br /&gt;
position). &lt;br /&gt;
&lt;br /&gt;
Disorder should be dealt with from two points of view: the atom and&lt;br /&gt;
the residue points of view. In general, I have tried to encapsulate&lt;br /&gt;
all the complexity that arises from disorder. If you just want to&lt;br /&gt;
loop over all C&amp;amp;alpha; atoms, you do not care that some residues&lt;br /&gt;
have a disordered side chain. On the other hand it should also be&lt;br /&gt;
possible to represent disorder completely in the data structure. Therefore,&lt;br /&gt;
disordered atoms or residues are stored in special objects that behave&lt;br /&gt;
as if there is no disorder. This is done by only representing a subset&lt;br /&gt;
of the disordered atoms or residues. Which subset is picked (e.g.&lt;br /&gt;
which of the two disordered OG side chain atom positions of a Ser&lt;br /&gt;
residue is used) can be specified by the user.&lt;br /&gt;
&lt;br /&gt;
'''Disordered atom positions''' are represented by ordinary &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;&lt;br /&gt;
objects, but all &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects that represent the same physical&lt;br /&gt;
atom are stored in a &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; object (see section [[#The Structure object]]).&lt;br /&gt;
Each &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object in a &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; object can&lt;br /&gt;
be uniquely indexed using its altloc specifier. The &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt;&lt;br /&gt;
object forwards all uncaught method calls to the selected Atom object,&lt;br /&gt;
by default the one that represents the atom with the highest&lt;br /&gt;
occupancy. The user can of course change the selected &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;&lt;br /&gt;
object, making use of its altloc specifier. In this way atom disorder&lt;br /&gt;
is represented correctly without much additional complexity. In other&lt;br /&gt;
words, if you are not interested in atom disorder, you will not be&lt;br /&gt;
bothered by it.&lt;br /&gt;
&lt;br /&gt;
Each disordered atom has a characteristic altloc identifier. You can&lt;br /&gt;
specify that a &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; object should behave like&lt;br /&gt;
the &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object associated with a specific altloc identifier:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
atom.disordered_select('A')  # select altloc A atom&lt;br /&gt;
atom.disordered_select('B')  # select altloc B atom &lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
A special case arises when disorder is due to '''point mutations''',&lt;br /&gt;
i.e. when two or more point mutants of a polypeptide are present in&lt;br /&gt;
the crystal. An example of this can be found in PDB structure 1EN2.&lt;br /&gt;
&lt;br /&gt;
Since these residues belong to a different residue type (e.g. let's&lt;br /&gt;
say Ser 60 and Cys 60) they should not be stored in a single &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt;&lt;br /&gt;
object as in the common case. In this case, each residue is represented&lt;br /&gt;
by one &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object, and both &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; objects&lt;br /&gt;
are stored in a single &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt; object (see Fig.&lt;br /&gt;
\ref{cap:SMCRA).&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt; object forwards all uncaught&lt;br /&gt;
methods to the selected &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object (by default the last&lt;br /&gt;
&amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object added), and thus behaves like an ordinary&lt;br /&gt;
residue. Each &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object in a &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt;&lt;br /&gt;
object can be uniquely identified by its residue name. In the above&lt;br /&gt;
example, residue Ser 60 would have id 'SER' in the &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt;&lt;br /&gt;
object, while residue Cys 60 would have id 'CYS'. The user can select&lt;br /&gt;
the active &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object in a &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt;&lt;br /&gt;
object via this id.&lt;br /&gt;
&lt;br /&gt;
Example: suppose that a chain has a point mutation at position 10,&lt;br /&gt;
consisting of a Ser and a Cys residue. Make sure that residue 10 of&lt;br /&gt;
this chain behaves as the Cys residue.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
residue = chain[10]&lt;br /&gt;
residue.disordered_select('CYS')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
In addition, you can get a list of all &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects (ie.&lt;br /&gt;
all &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; objects are 'unpacked' to their individual&lt;br /&gt;
&amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects) using the &amp;lt;code&amp;gt;get_unpacked_list&amp;lt;/code&amp;gt; method&lt;br /&gt;
of a (&amp;lt;code&amp;gt;Disordered&amp;lt;/code&amp;gt;)&amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object.&lt;br /&gt;
&lt;br /&gt;
==== Can I sort residues in a chain somehow? ====&lt;br /&gt;
&lt;br /&gt;
Yes, kinda, but I'm waiting for a request for this feature to finish&lt;br /&gt;
it :-).&lt;br /&gt;
&lt;br /&gt;
==== How are ligands and solvent handled? ====&lt;br /&gt;
&lt;br /&gt;
See 'What is a residue id?'.&lt;br /&gt;
&lt;br /&gt;
==== What about B factors? ====&lt;br /&gt;
&lt;br /&gt;
Well, yes! Bio.PDB supports isotropic and anisotropic B factors, and&lt;br /&gt;
also deals with standard deviations of anisotropic B factor if present&lt;br /&gt;
(see the section [[#Analysis]]).&lt;br /&gt;
&lt;br /&gt;
==== What about standard deviation of atomic positions? ====&lt;br /&gt;
&lt;br /&gt;
Yup, supported. See the section [[#Analysis]].&lt;br /&gt;
&lt;br /&gt;
==== I think the SMCRA data structure is not flexible/sexy/whatever enough... ====&lt;br /&gt;
&lt;br /&gt;
Sure, sure. Everybody is always coming up with (mostly vaporware or&lt;br /&gt;
partly implemented) data structures that handle all possible situations&lt;br /&gt;
and are extensible in all thinkable (and unthinkable) ways. The prosaic&lt;br /&gt;
truth however is that 99.9\% of people using (and I mean really using!)&lt;br /&gt;
crystal structures think in terms of models, chains, residues and&lt;br /&gt;
atoms. The philosophy of Bio.PDB is to provide a reasonably fast,&lt;br /&gt;
clean, simple, but complete data structure to access structure data.&lt;br /&gt;
The proof of the pudding is in the eating.&lt;br /&gt;
&lt;br /&gt;
Moreover, it is quite easy to build more specialised data structures&lt;br /&gt;
on top of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; class (eg. there's a &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt;&lt;br /&gt;
class). On the other hand, the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object is built&lt;br /&gt;
using a Parser/Consumer approach (called &amp;lt;code&amp;gt;PDBParser&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;MMCIFParser&amp;lt;/code&amp;gt;&lt;br /&gt;
and &amp;lt;code&amp;gt;StructureBuilder&amp;lt;/code&amp;gt;, respectively). One can easily reuse&lt;br /&gt;
the PDB/mmCIF parsers by implementing a specialised &amp;lt;code&amp;gt;StructureBuilder&amp;lt;/code&amp;gt;&lt;br /&gt;
class. It is of course also trivial to add support for new file formats&lt;br /&gt;
by writing new parsers.&lt;br /&gt;
&lt;br /&gt;
=== Analysis ===&lt;br /&gt;
&lt;br /&gt;
==== How do I extract information from an &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object? ====&lt;br /&gt;
&lt;br /&gt;
Using the following methods:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
a.get_name()           # atom name (spaces stripped, e.g. 'CA')&lt;br /&gt;
a.get_id()             # id (equals atom name)&lt;br /&gt;
a.get_coord()          # atomic coordinates&lt;br /&gt;
a.get_vector()         # atomic coordinates as Vector object&lt;br /&gt;
a.get_bfactor()        # isotropic B factor&lt;br /&gt;
a.get_occupancy()      # occupancy&lt;br /&gt;
a.get_altloc()         # alternative location specifier&lt;br /&gt;
a.get_sigatm()         # std. dev. of atomic parameters&lt;br /&gt;
a.get_siguij()         # std. dev. of anisotropic B factor&lt;br /&gt;
a.get_anisou()         # anisotropic B factor&lt;br /&gt;
a.get_fullname()       # atom name (with spaces, e.g. '.CA.')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I extract information from a &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object? ====&lt;br /&gt;
&lt;br /&gt;
Using the following methods:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
r.get_resname()         # return the residue name (eg. 'GLY')&lt;br /&gt;
r.is_disordered()       # 1 if the residue has disordered atoms&lt;br /&gt;
r.get_segid()           # return the SEGID&lt;br /&gt;
r.has_id(name)          # test if a residue has a certain atom&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I measure distances? ====&lt;br /&gt;
&lt;br /&gt;
That's simple: the minus operator for atoms has been overloaded to&lt;br /&gt;
return the distance between two atoms. &lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Get some atoms&lt;br /&gt;
ca1 = residue1['CA']&lt;br /&gt;
ca2 = residue2['CA']&lt;br /&gt;
# Simply subtract the atoms to get their distance&lt;br /&gt;
distance = ca1-ca2&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I measure angles? ====&lt;br /&gt;
&lt;br /&gt;
This can easily be done via the vector representation of the atomic coordinates, and the &amp;lt;code&amp;gt;calc_angle&amp;lt;/code&amp;gt; function from the &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; module:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
vector1 = atom1.get_vector()&lt;br /&gt;
vector2 = atom2.get_vector()&lt;br /&gt;
vector3 = atom3.get_vector()&lt;br /&gt;
angle = calc_angle(vector1, vector2, vector3)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I measure torsion angles? ====&lt;br /&gt;
&lt;br /&gt;
Again, this can easily be done via the vector representation of the&lt;br /&gt;
atomic coordinates, this time using the &amp;lt;code&amp;gt;calc_dihedral&amp;lt;/code&amp;gt; function&lt;br /&gt;
from the &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; module:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
vector1 = atom1.get_vector()&lt;br /&gt;
vector2 = atom2.get_vector()&lt;br /&gt;
vector3 = atom3.get_vector()&lt;br /&gt;
vector4 = atom4.get_vector()&lt;br /&gt;
angle = calc_dihedral(vector1, vector2, vector3, vector4)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I determine atom-atom contacts? ====&lt;br /&gt;
&lt;br /&gt;
Use &amp;lt;code&amp;gt;NeighborSearch&amp;lt;/code&amp;gt;. This uses a KD tree data structure coded&lt;br /&gt;
in C behind the screens, so it's pretty darn fast (see &amp;lt;code&amp;gt;Bio.KDTree&amp;lt;/code&amp;gt;).&lt;br /&gt;
&lt;br /&gt;
==== How do I extract polypeptides from a &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object? ====&lt;br /&gt;
&lt;br /&gt;
Use &amp;lt;code&amp;gt;PolypeptideBuilder&amp;lt;/code&amp;gt;. You can use the resulting &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt;&lt;br /&gt;
object to get the sequence as a &amp;lt;code&amp;gt;Seq&amp;lt;/code&amp;gt; object or to get a list&lt;br /&gt;
of C&amp;amp;alpha; atoms as well. Polypeptides can be built using a C-N&lt;br /&gt;
or a C&amp;amp;alpha;-C&amp;amp;alpha; distance criterion.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Using C-N &lt;br /&gt;
ppb=PPBuilder()&lt;br /&gt;
for pp in ppb.build_peptides(structure): &lt;br /&gt;
    print pp.get_sequence()&lt;br /&gt;
&lt;br /&gt;
# Using CA-CA&lt;br /&gt;
ppb=CaPPBuilder()&lt;br /&gt;
for pp in ppb.build_peptides(structure): &lt;br /&gt;
    print pp.get_sequence()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that in the above case only model 0 of the structure is considered&lt;br /&gt;
by &amp;lt;code&amp;gt;PolypeptideBuilder&amp;lt;/code&amp;gt;. However, it is possible to use &amp;lt;code&amp;gt;PolypeptideBuilder&amp;lt;/code&amp;gt;&lt;br /&gt;
to build &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt; objects from &amp;lt;code&amp;gt;Model&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;Chain&amp;lt;/code&amp;gt;&lt;br /&gt;
objects as well.&lt;br /&gt;
&lt;br /&gt;
==== How do I get the sequence of a structure? ====&lt;br /&gt;
&lt;br /&gt;
The first thing to do is to extract all polypeptides from the structure&lt;br /&gt;
(see previous entry). The sequence of each polypeptide can then easily&lt;br /&gt;
be obtained from the &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt; objects. The sequence is&lt;br /&gt;
represented as a Biopython &amp;lt;code&amp;gt;Seq&amp;lt;/code&amp;gt; object, and its alphabet is&lt;br /&gt;
defined by a &amp;lt;code&amp;gt;ProteinAlphabet&amp;lt;/code&amp;gt; object.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; seq = polypeptide.get_sequence()&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print seq&lt;br /&gt;
Seq('SNVVE...', &amp;lt;class Bio.Alphabet.ProteinAlphabet&amp;gt;)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I determine secondary structure? ====&lt;br /&gt;
&lt;br /&gt;
For this functionality, you need to install DSSP (and obtain a license&lt;br /&gt;
for it - free for academic use, see http://www.cmbi.kun.nl/gv/dssp/).&lt;br /&gt;
Then use the &amp;lt;code&amp;gt;DSSP&amp;lt;/code&amp;gt; class, which maps &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; objects&lt;br /&gt;
to their secondary structure (and accessible surface area). The DSSP&lt;br /&gt;
codes are listed in Table \ref{cap:DSSP-codes. Note that DSSP (the&lt;br /&gt;
program, and thus by consequence the class) cannot handle multiple&lt;br /&gt;
models!&lt;br /&gt;
&lt;br /&gt;
%&lt;br /&gt;
\begin{table&lt;br /&gt;
&lt;br /&gt;
==== \begin{tabular{|c|c|&lt;br /&gt;
\hline &lt;br /&gt;
Code&amp;amp;&lt;br /&gt;
Secondary structure\tabularnewline&lt;br /&gt;
\hline&lt;br /&gt;
\hline &lt;br /&gt;
H&amp;amp;&lt;br /&gt;
&amp;amp;alpha;-helix\tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
B&amp;amp;&lt;br /&gt;
Isolated &amp;amp;beta;-bridge residue\tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
E&amp;amp;&lt;br /&gt;
Strand \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
G&amp;amp;&lt;br /&gt;
3-10 helix \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
I&amp;amp;&lt;br /&gt;
$\Pi$-helix \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
T&amp;amp;&lt;br /&gt;
Turn\tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
S&amp;amp;&lt;br /&gt;
Bend \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
-&amp;amp;&lt;br /&gt;
Other\tabularnewline&lt;br /&gt;
\hline&lt;br /&gt;
\end{tabular&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
\caption{\label{cap:DSSP-codesDSSP codes in Bio.PDB.&lt;br /&gt;
\end{table&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==== How do I calculate the accessible surface area of a residue? ====&lt;br /&gt;
&lt;br /&gt;
Use the &amp;lt;code&amp;gt;DSSP&amp;lt;/code&amp;gt; class (see also previous entry). But see also&lt;br /&gt;
next entry.&lt;br /&gt;
&lt;br /&gt;
==== How do I calculate residue depth? ====&lt;br /&gt;
&lt;br /&gt;
Residue depth is the average distance of a residue's atoms from the&lt;br /&gt;
solvent accessible surface. It's a fairly new and very powerful parameterization&lt;br /&gt;
of solvent accessibility. For this functionality, you need to install&lt;br /&gt;
Michel Sanner's MSMS program (http://www.scripps.edu/pub/olson-web/people/sanner/html/msms_home.html).&lt;br /&gt;
Then use the &amp;lt;code&amp;gt;ResidueDepth&amp;lt;/code&amp;gt; class. This class behaves as a&lt;br /&gt;
dictionary which maps &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; objects to corresponding (residue&lt;br /&gt;
depth, C&amp;amp;alpha; depth) tuples. The C&amp;amp;alpha; depth is the distance&lt;br /&gt;
of a residue's C&amp;amp;alpha; atom to the solvent accessible surface. &lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
model = structure[0]&lt;br /&gt;
rd = ResidueDepth(model, pdb_file)&lt;br /&gt;
residue_depth, ca_depth = rd[some_residue]&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
You can also get access to the molecular surface itself (via the &amp;lt;code&amp;gt;get_surface&amp;lt;/code&amp;gt;&lt;br /&gt;
function), in the form of a Numeric python array with the surface points.&lt;br /&gt;
&lt;br /&gt;
==== How do I calculate Half Sphere Exposure? ====&lt;br /&gt;
&lt;br /&gt;
Half Sphere Exposure (HSE) is a new, 2D measure of solvent exposure.&lt;br /&gt;
Basically, it counts the number of C&amp;amp;alpha; atoms around a residue&lt;br /&gt;
in the direction of its side chain, and in the opposite direction&lt;br /&gt;
(within a radius of 13 Å). Despite its simplicity, it outperforms&lt;br /&gt;
many other measures of solvent exposure. An article describing this&lt;br /&gt;
novel 2D measure has been submitted.&lt;br /&gt;
&lt;br /&gt;
HSE comes in two flavors: HSE&amp;amp;alpha; and HSE&amp;amp;beta;. The former&lt;br /&gt;
only uses the C&amp;amp;alpha; atom positions, while the latter uses the&lt;br /&gt;
C&amp;amp;alpha; and C&amp;amp;beta; atom positions. The HSE measure is calculated&lt;br /&gt;
by the &amp;lt;code&amp;gt;HSExposure&amp;lt;/code&amp;gt; class, which can also calculate the contact&lt;br /&gt;
number. The latter class has methods which return dictionaries that&lt;br /&gt;
map a &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object to its corresponding HSE&amp;amp;alpha;, HSE&amp;amp;beta;&lt;br /&gt;
and contact number values.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
model = structure[0]&lt;br /&gt;
hse = HSExposure()&lt;br /&gt;
# Calculate HSEalpha&lt;br /&gt;
exp_ca = hse.calc_hs_exposure(model, option='CA3')&lt;br /&gt;
# Calculate HSEbeta&lt;br /&gt;
exp_cb = hse.calc_hs_exposure(model, option='CB')&lt;br /&gt;
# Calculate classical coordination number&lt;br /&gt;
exp_fs = hse.calc_fs_exposure(model)&lt;br /&gt;
# Print HSEalpha for a residue&lt;br /&gt;
print exp_ca[some_residue]&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I map the residues of two related structures onto each other? ====&lt;br /&gt;
&lt;br /&gt;
First, create an alignment file in FASTA format, then use the &amp;lt;code&amp;gt;StructureAlignment&amp;lt;/code&amp;gt;&lt;br /&gt;
class. This class can also be used for alignments with more than two&lt;br /&gt;
structures.&lt;br /&gt;
&lt;br /&gt;
==== How do I test if a Residue object is an amino acid? ====&lt;br /&gt;
&lt;br /&gt;
Use &amp;lt;code&amp;gt;is_aa(residue)&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
==== Can I do vector operations on atomic coordinates? ====&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects return a &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; object representation&lt;br /&gt;
of the coordinates with the &amp;lt;code&amp;gt;get_vector&amp;lt;/code&amp;gt; method. &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt;&lt;br /&gt;
implements the full set of 3D vector operations, matrix multiplication&lt;br /&gt;
(left and right) and some advanced rotation-related operations as&lt;br /&gt;
well. See also next question.&lt;br /&gt;
&lt;br /&gt;
==== How do I put a virtual C&amp;amp;beta; on a Gly residue? ====&lt;br /&gt;
&lt;br /&gt;
OK, I admit, this example is only present to show off the possibilities&lt;br /&gt;
of Bio.PDB's &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; module (though this code is actually&lt;br /&gt;
used in the &amp;lt;code&amp;gt;HSExposure&amp;lt;/code&amp;gt; module, which contains a novel way&lt;br /&gt;
to parametrize residue exposure - publication underway). Suppose that&lt;br /&gt;
you would like to find the position of a Gly residue's C&amp;amp;beta; atom,&lt;br /&gt;
if it had one. How would you do that? Well, rotating the N atom of&lt;br /&gt;
the Gly residue along the C&amp;amp;alpha;-C bond over -120 degrees roughly&lt;br /&gt;
puts it in the position of a virtual C&amp;amp;beta; atom. Here's how to&lt;br /&gt;
do it, making use of the &amp;lt;code&amp;gt;rotaxis&amp;lt;/code&amp;gt; method (which can be used&lt;br /&gt;
to construct a rotation around a certain axis) of the &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt;&lt;br /&gt;
module:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# get atom coordinates as vectors&lt;br /&gt;
n = residue['N'].get_vector() &lt;br /&gt;
c = residue['C'].get_vector() &lt;br /&gt;
ca = residue['CA'].get_vector()&lt;br /&gt;
# center at origin&lt;br /&gt;
n = n - ca&lt;br /&gt;
c = c - ca&lt;br /&gt;
# find rotation matrix that rotates n -120 degrees along the ca-c vector&lt;br /&gt;
rot = rotaxis(-pi{*120.0/180.0, c)&lt;br /&gt;
# apply rotation to ca-n vector&lt;br /&gt;
cb_at_origin = n.left_multiply(rot)&lt;br /&gt;
# put on top of ca atom&lt;br /&gt;
cb = cb_at_origin + ca&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
This example shows that it's possible to do some quite nontrivial&lt;br /&gt;
vector operations on atomic data, which can be quite useful. In addition&lt;br /&gt;
to all the usual vector operations (cross (use &amp;lt;code&amp;gt;**&amp;lt;/code&amp;gt;), and&lt;br /&gt;
dot (use &amp;lt;code&amp;gt;*&amp;lt;/code&amp;gt;) product, angle, norm, etc.) and the above mentioned&lt;br /&gt;
&amp;lt;code&amp;gt;rotaxis function, the &amp;lt;code&amp;gt;Vector module also has methods&lt;br /&gt;
to rotate (&amp;lt;code&amp;gt;rotmat&amp;lt;/code&amp;gt;) or reflect (&amp;lt;code&amp;gt;refmat&amp;lt;/code&amp;gt;) one vector&lt;br /&gt;
on top of another.&lt;br /&gt;
&lt;br /&gt;
=== Manipulating the structure ===&lt;br /&gt;
&lt;br /&gt;
==== How do I superimpose two structures? ====&lt;br /&gt;
&lt;br /&gt;
Surprisingly, this is done using the &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; object.&lt;br /&gt;
This object calculates the rotation and translation matrix that rotates&lt;br /&gt;
two lists of atoms on top of each other in such a way that their RMSD&lt;br /&gt;
is minimized. Of course, the two lists need to contain the same amount&lt;br /&gt;
of atoms. The &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; object can also apply the rotation/translation&lt;br /&gt;
to a list of atoms. The rotation and translation are stored as a tuple&lt;br /&gt;
in the &amp;lt;code&amp;gt;rotran&amp;lt;/code&amp;gt; attribute of the &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; object&lt;br /&gt;
(note that the rotation is right multiplying!). The RMSD is stored&lt;br /&gt;
in the &amp;lt;code&amp;gt;rmsd&amp;lt;/code&amp;gt; attribute.&lt;br /&gt;
&lt;br /&gt;
The algorithm used by &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; comes from \textit{Matrix&lt;br /&gt;
computations, 2nd ed. Golub, G. \&amp;amp; Van Loan (1989) and makes use&lt;br /&gt;
of singular value decomposition (this is implemented in the general&lt;br /&gt;
&amp;lt;code&amp;gt;Bio.SVDSuperimposer&amp;lt;/code&amp;gt; module).&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
sup = Superimposer()&lt;br /&gt;
# Specify the atom lists&lt;br /&gt;
# 'fixed' and 'moving' are lists of Atom objects&lt;br /&gt;
# The moving atoms will be put on the fixed atoms&lt;br /&gt;
sup.set_atoms(fixed, moving)&lt;br /&gt;
# Print rotation/translation/rmsd&lt;br /&gt;
print sup.rotran&lt;br /&gt;
print sup.rms &lt;br /&gt;
# Apply rotation/translation to the moving atoms&lt;br /&gt;
sup.apply(moving)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I superimpose two structures based on their active sites? ====&lt;br /&gt;
&lt;br /&gt;
Pretty easily. Use the active site atoms to calculate the rotation/translation&lt;br /&gt;
matrices (see above), and apply these to the whole molecule.&lt;br /&gt;
&lt;br /&gt;
==== Can I manipulate the atomic coordinates? ====&lt;br /&gt;
&lt;br /&gt;
Yes, using the &amp;lt;code&amp;gt;transform&amp;lt;/code&amp;gt; method of the &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object,&lt;br /&gt;
or directly using the &amp;lt;code&amp;gt;set_coord&amp;lt;/code&amp;gt; method.&lt;br /&gt;
&lt;br /&gt;
== Other Structural Bioinformatics modules ==&lt;br /&gt;
&lt;br /&gt;
==== Bio.SCOP ====&lt;br /&gt;
&lt;br /&gt;
See the main Biopython tutorial.&lt;br /&gt;
&lt;br /&gt;
==== Bio.FSSP ====&lt;br /&gt;
&lt;br /&gt;
No documentation available yet.&lt;br /&gt;
&lt;br /&gt;
== You haven't answered my question yet! ==&lt;br /&gt;
&lt;br /&gt;
Woah! It's late and I'm tired, and a glass of excellent ''Pedro Ximenez'' sherry is waiting for me. Just drop me a mail, and I'll answer&lt;br /&gt;
you in the morning (with a bit of luck...).&lt;br /&gt;
&lt;br /&gt;
== Contributors ==&lt;br /&gt;
&lt;br /&gt;
The main author/maintainer of Bio.PDB is:&lt;br /&gt;
&lt;br /&gt;
 Thomas Hamelryck&lt;br /&gt;
 Bioinformatics center&lt;br /&gt;
 Institute of Molecular Biology&lt;br /&gt;
 University of Copenhagen&lt;br /&gt;
 Universitetsparken 15, Bygning 10&lt;br /&gt;
 DK-2100 København Ø&lt;br /&gt;
 Denmark&lt;br /&gt;
 thamelry@binf.ku.dk&lt;br /&gt;
&lt;br /&gt;
Kristian Rother donated code to interact with the PDB database, and to parse the PDB&lt;br /&gt;
header. Indraneel Majumdar sent in some bug reports and assisted in&lt;br /&gt;
coding the &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt; module. Many thanks to Brad Chapman,&lt;br /&gt;
Jeffrey Chang, Andrew Dalke and Iddo Friedberg for suggestions, comments,&lt;br /&gt;
help and/or biting criticism :-).&lt;br /&gt;
&lt;br /&gt;
== Can I contribute? ==&lt;br /&gt;
&lt;br /&gt;
Yes, yes, yes! Just send an e-mail to [mailto:thamelry@binf.ku.dk Thomas Hamelryck] or to [mailto:biopython-dev@biopython.org the Biopython developers] if you have something useful to contribute! Eternal fame awaits!&lt;/div&gt;</summary>
		<author><name>Mdehoon</name></author>	</entry>

	<entry>
		<id>http://www.biopython.org/wiki/The_Biopython_Structural_Bioinformatics_FAQ</id>
		<title>The Biopython Structural Bioinformatics FAQ</title>
		<link rel="alternate" type="text/html" href="http://www.biopython.org/wiki/The_Biopython_Structural_Bioinformatics_FAQ"/>
				<updated>2013-03-26T08:31:23Z</updated>
		
		<summary type="html">&lt;p&gt;Mdehoon: &lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Introduction ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB is a biopython module that focuses on working with crystal&lt;br /&gt;
structures of biological macromolecules. This document gives a fairly&lt;br /&gt;
complete overview of Bio.PDB.&lt;br /&gt;
&lt;br /&gt;
== Bio.PDB's installation ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB is automatically installed as part of Biopython. Biopython&lt;br /&gt;
can be obtained from http://www.biopython.org. It runs on many&lt;br /&gt;
platforms (Linux/Unix, windows, Mac,...).&lt;br /&gt;
&lt;br /&gt;
== Who's using Bio.PDB? ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB was used in the construction of DISEMBL, a web server that&lt;br /&gt;
predicts disordered regions in proteins (http://dis.embl.de/),&lt;br /&gt;
and COLUMBA, a website that provides annotated protein structures&lt;br /&gt;
(http://www.columba-db.de/). Bio.PDB has also been used to&lt;br /&gt;
perform a large scale search for active sites similarities between&lt;br /&gt;
protein structures in the PDB (see [http://dx.doi.org/10.1002/prot.10338 \textit{Proteins Struct. Func.&lt;br /&gt;
Gen., \textbf{2003, 51, 96-108]) , and to develop a new algorithm&lt;br /&gt;
that identifies linear secondary structure elements ([\emph{BMC Bioinformatics,&lt;br /&gt;
\textbf{2005, 6, 202 http://www.biomedcentral.com/1471-2105/6/202]).&lt;br /&gt;
&lt;br /&gt;
Judging from requests for features and information, Bio.PDB is also&lt;br /&gt;
used by several LPCs (Large Pharmaceutical Companies :-).&lt;br /&gt;
&lt;br /&gt;
== Is there a Bio.PDB reference? ==&lt;br /&gt;
&lt;br /&gt;
Yes, and I'd appreciate it if you would refer to Bio.PDB in publications&lt;br /&gt;
if you make use of it. The reference is:&lt;br /&gt;
&lt;br /&gt;
\begin{quote&lt;br /&gt;
Hamelryck, T., Manderick, B. (2003) PDB parser and structure class&lt;br /&gt;
implemented in Python. \textit{Bioinformatics, \textbf{19, 2308-2310. &lt;br /&gt;
\end{quote&lt;br /&gt;
The article can be freely downloaded via the Bioinformatics journal&lt;br /&gt;
website (http://www.binf.ku.dk/users/thamelry/references.html).&lt;br /&gt;
I welcome e-mails telling me what you are using Bio.PDB for. Feature&lt;br /&gt;
requests are welcome too.&lt;br /&gt;
&lt;br /&gt;
== How well tested is Bio.PDB? ==&lt;br /&gt;
&lt;br /&gt;
Pretty well, actually. Bio.PDB has been extensively tested on nearly&lt;br /&gt;
5500 structures from the PDB - all structures seemed to be parsed&lt;br /&gt;
correctly. More details can be found in the Bio.PDB Bioinformatics&lt;br /&gt;
article. Bio.PDB has been used/is being used in many research projects&lt;br /&gt;
as a reliable tool. In fact, I'm using Bio.PDB almost daily for research&lt;br /&gt;
purposes and continue working on improving it and adding new features.&lt;br /&gt;
&lt;br /&gt;
== How fast is it? ==&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;PDBParser&amp;lt;/code&amp;gt; performance was tested on about 800 structures&lt;br /&gt;
(each belonging to a unique SCOP superfamily). This takes about 20&lt;br /&gt;
minutes, or on average 1.5 seconds per structure. Parsing the structure&lt;br /&gt;
of the large ribosomal subunit (1FKK), which contains about 64000&lt;br /&gt;
atoms, takes 10 seconds on a 1000 MHz PC. In short: it's more than&lt;br /&gt;
fast enough for many applications.&lt;br /&gt;
&lt;br /&gt;
== Why should I use Bio.PDB? ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB might be exactly what you want, and then again it might not.&lt;br /&gt;
If you are interested in data mining the PDB header, you might want&lt;br /&gt;
to look elsewhere because there is only limited support for this.&lt;br /&gt;
If you look for a powerful, complete data structure to access the&lt;br /&gt;
atomic data Bio.PDB is probably for you. &lt;br /&gt;
&lt;br /&gt;
== Usage ==&lt;br /&gt;
&lt;br /&gt;
=== General questions ===&lt;br /&gt;
&lt;br /&gt;
==== Importing Bio.PDB ====&lt;br /&gt;
&lt;br /&gt;
That's simple:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
from Bio.PDB import *&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Is there support for molecular graphics? ====&lt;br /&gt;
&lt;br /&gt;
Not directly, mostly since there are quite a few Python based/Python&lt;br /&gt;
aware solutions already, that can potentially be used with Bio.PDB.&lt;br /&gt;
My choice is Pymol, BTW (I've used this successfully with Bio.PDB,&lt;br /&gt;
and there will probably be specific PyMol modules in Bio.PDB soon/some&lt;br /&gt;
day). Python based/aware molecular graphics solutions include:&lt;br /&gt;
&lt;br /&gt;
* PyMol: http://pymol.sourceforge.net/&lt;br /&gt;
* Chimera: http://www.cgl.ucsf.edu/chimera/&lt;br /&gt;
* PMV: http://www.scripps.edu/ sanner/python/&lt;br /&gt;
* Coot: http://www.ysbl.york.ac.uk/ emsley/coot/&lt;br /&gt;
* CCP4mg: http://www.ysbl.york.ac.uk/ lizp/molgraphics.html&lt;br /&gt;
* mmLib: http://pymmlib.sourceforge.net/ &lt;br /&gt;
* VMD: http://www.ks.uiuc.edu/Research/vmd/&lt;br /&gt;
* MMTK: http://starship.python.net/crew/hinsen/MMTK/&lt;br /&gt;
&lt;br /&gt;
I'd be crazy to write another molecular graphics application (been&lt;br /&gt;
there - done that, actually :-).&lt;br /&gt;
&lt;br /&gt;
=== Input/output ===&lt;br /&gt;
&lt;br /&gt;
==== How do I create a structure object from a PDB file? ====&lt;br /&gt;
&lt;br /&gt;
First, create a &amp;lt;code&amp;gt;PDBParser&amp;lt;/code&amp;gt; object:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
parser = PDBParser()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then, create a structure object from a PDB file in the following way&lt;br /&gt;
(the PDB file in this case is called '1FAT.pdb', 'PHA-L' is a user&lt;br /&gt;
defined name for the structure):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
structure = parser.get_structure('PHA-L', '1FAT.pdb')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I create a structure object from an mmCIF file? ====&lt;br /&gt;
&lt;br /&gt;
Similarly to the case the case of PDB files, first create an &amp;lt;code&amp;gt;MMCIFParser&amp;lt;/code&amp;gt;&lt;br /&gt;
object:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
parser = MMCIFParser()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Then use this parser to create a structure object from the mmCIF file:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
structure = parser.get_structure('PHA-L', '1FAT.cif')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== ...and what about the new PDB XML format? ====&lt;br /&gt;
&lt;br /&gt;
That's not yet supported, but I'm definitely planning to support that&lt;br /&gt;
in the future (it's not a lot of work). Contact me if you need this,&lt;br /&gt;
it might encourage me :-).&lt;br /&gt;
&lt;br /&gt;
==== I'd like to have some more low level access to an mmCIF file... ====&lt;br /&gt;
&lt;br /&gt;
You got it. You can create a python dictionary that maps all mmCIF&lt;br /&gt;
tags in an mmCIF file to their values. If there are multiple values&lt;br /&gt;
(like in the case of tag &amp;lt;code&amp;gt;_atom_site.Cartn_y&amp;lt;/code&amp;gt;, which holds&lt;br /&gt;
the ''y'' coordinates of all atoms), the tag is mapped to a list of values.&lt;br /&gt;
The dictionary is created from the mmCIF file as follows:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
mmcif_dict = MMCIF2Dict('1FAT.cif')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example: get the solvent content from an mmCIF file:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
sc = mmcif_dict['_exptl_crystal.density_percent_sol']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example: get the list of the ''y'' coordinates of all atoms&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
y_list = mmcif_dict['_atom_site.Cartn_y']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Can I access the header information? ====&lt;br /&gt;
&lt;br /&gt;
Thanks to Christian Rother you can access some information from the&lt;br /&gt;
PDB header. Note however that many PDB files contain headers with&lt;br /&gt;
incomplete or erroneous information. Many of the errors have been&lt;br /&gt;
fixed in the equivalent mmCIF files.&lt;br /&gt;
'''Hence, if you are interested in the header information, it is a good idea to extract information from mmCIF files using the &amp;lt;code&amp;gt;MMCIF2Dict&amp;lt;/code&amp;gt; tool described above, instead of parsing the PDB header.''' &lt;br /&gt;
&lt;br /&gt;
Now that is clarified, let's return to parsing the PDB header. The&lt;br /&gt;
structure object has an attribute called &amp;lt;code&amp;gt;header&amp;lt;/code&amp;gt; which is&lt;br /&gt;
a python dictionary that maps header records to their values.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
resolution = structure.header['resolution']&lt;br /&gt;
keywords = structure.header['keywords']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The available keys are &amp;lt;code&amp;gt;name&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;head&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;deposition_date&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;release_date&amp;lt;/code&amp;gt;,&lt;br /&gt;
&amp;lt;code&amp;gt;structure_method&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;resolution&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;structure_reference&amp;lt;/code&amp;gt; (maps to&lt;br /&gt;
a list of references), &amp;lt;code&amp;gt;journal_reference&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;author&amp;lt;/code&amp;gt; and&lt;br /&gt;
&amp;lt;code&amp;gt;compound&amp;lt;/code&amp;gt; (maps to a dictionary with various information about&lt;br /&gt;
the crystallized compound).&lt;br /&gt;
&lt;br /&gt;
The dictionary can also be created without creating a &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt;&lt;br /&gt;
object, ie. directly from the PDB file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
handle = open(filename,'r')&lt;br /&gt;
header_dict = parse_pdb_header(handle)&lt;br /&gt;
handle.close()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Can I use Bio.PDB with NMR structures (ie. with more than one model)? ====&lt;br /&gt;
&lt;br /&gt;
Sure. Many PDB parsers assume that there is only one model, making&lt;br /&gt;
them all but useless for NMR structures. The design of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt;&lt;br /&gt;
object makes it easy to handle PDB files with more than one model&lt;br /&gt;
(see section [[#The Structure object]]). &lt;br /&gt;
&lt;br /&gt;
==== How do I download structures from the PDB? ====&lt;br /&gt;
&lt;br /&gt;
This can be done using the &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object, using the &amp;lt;code&amp;gt;retrieve_pdb_file&amp;lt;/code&amp;gt;&lt;br /&gt;
method. The argument for this method is the PDB identifier of the&lt;br /&gt;
structure.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
pdbl = PDBList()&lt;br /&gt;
pdbl.retrieve_pdb_file('1FAT')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; class can also be used as a command-line tool: &lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
python PDBList.py 1fat&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
The downloaded file will be called &amp;lt;code&amp;gt;pdb1fat.ent&amp;lt;/code&amp;gt; and stored&lt;br /&gt;
in the current working directory. Note that the &amp;lt;code&amp;gt;retrieve_pdb_file&amp;lt;/code&amp;gt;&lt;br /&gt;
method also has an optional argument &amp;lt;code&amp;gt;pdir&amp;lt;/code&amp;gt; that specifies&lt;br /&gt;
a specific directory in which to store the downloaded PDB files. &lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;retrieve_pdb_file&amp;lt;/code&amp;gt; method also has some options for specifying&lt;br /&gt;
the compression format used for the download, and the program used&lt;br /&gt;
for local decompression (default &amp;lt;code&amp;gt;.Z&amp;lt;/code&amp;gt; format and &amp;lt;code&amp;gt;gunzip&amp;lt;/code&amp;gt;).&lt;br /&gt;
In addition, the PDB ftp site can be specified upon creation of the&lt;br /&gt;
&amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object. By default, the server of the Worldwide Protein Data Bank (\url{ftp://ftp.wwpdb.org/pub/pdb/data/structures/divided/pdb/})&lt;br /&gt;
is used. See the API documentation for more details. Thanks again&lt;br /&gt;
to Kristian Rother for donating this module.&lt;br /&gt;
&lt;br /&gt;
==== How do I download the entire PDB? ====&lt;br /&gt;
&lt;br /&gt;
The following commands will store all PDB files in the &amp;lt;code&amp;gt;/data/pdb&amp;lt;/code&amp;gt;&lt;br /&gt;
directory: &lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
python PDBList.py all /data/pdb&lt;br /&gt;
python PDBList.py all /data/pdb -d&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
The API method for this is called &amp;lt;code&amp;gt;download_entire_pdb&amp;lt;/code&amp;gt;.&lt;br /&gt;
Adding the &amp;lt;code&amp;gt;-d&amp;lt;/code&amp;gt; option will store all files in the same directory.&lt;br /&gt;
Otherwise, they are sorted into PDB-style subdirectories according&lt;br /&gt;
to their PDB ID's. Depending on the traffic, a complete download will&lt;br /&gt;
take 2-4 days.&lt;br /&gt;
&lt;br /&gt;
==== How do I keep a local copy of the PDB up-to-date? ====&lt;br /&gt;
&lt;br /&gt;
This can also be done using the &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object. One simply&lt;br /&gt;
creates a &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object (specifying the directory where&lt;br /&gt;
the local copy of the PDB is present) and calls the &amp;lt;code&amp;gt;update_pdb&amp;lt;/code&amp;gt;&lt;br /&gt;
method:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
pl = PDBList(pdb='/data/pdb')&lt;br /&gt;
pl.update_pdb()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
One can of course make a weekly cronjob out of this to keep&lt;br /&gt;
the local copy automatically up-to-date. The PDB ftp site can also&lt;br /&gt;
be specified (see API documentation).&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; has some additional methods that can be of use. The&lt;br /&gt;
&amp;lt;code&amp;gt;get_all_obsolete&amp;lt;/code&amp;gt; method can be used to get a list of all&lt;br /&gt;
obsolete PDB entries. The &amp;lt;code&amp;gt;changed_this_week&amp;lt;/code&amp;gt; method can&lt;br /&gt;
be used to obtain the entries that were added, modified or obsoleted&lt;br /&gt;
during the current week. For more info on the possibilities of &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt;,&lt;br /&gt;
see the API documentation.&lt;br /&gt;
&lt;br /&gt;
==== What about all those buggy PDB files? ====&lt;br /&gt;
&lt;br /&gt;
It is well known that many PDB files contain semantic errors (I'm&lt;br /&gt;
not talking about the structures themselves know, but their representation&lt;br /&gt;
in PDB files). Bio.PDB tries to handle this in two ways. The PDBParser&lt;br /&gt;
object can behave in two ways: a restrictive way and a permissive&lt;br /&gt;
way (THIS IS NOW THE DEFAULT). The restrictive way used to be the&lt;br /&gt;
default, but people seemed to think that Bio.PDB 'crashed' due to&lt;br /&gt;
a bug (hah!), so I changed it. If you ever encounter a real bug, please&lt;br /&gt;
tell me immediately!&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Permissive parser&lt;br /&gt;
parser = PDBParser(PERMISSIVE=1)&lt;br /&gt;
parser = PDBParser()  # The same (default)&lt;br /&gt;
# Strict parser&lt;br /&gt;
strict_parser = PDBParser(PERMISSIVE=0)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
In the permissive state (DEFAULT), PDB files that obviously contain&lt;br /&gt;
errors are 'corrected' (i.e. some residues or atoms are left out).&lt;br /&gt;
These errors include:&lt;br /&gt;
* Multiple residues with the same identifier&lt;br /&gt;
* Multiple atoms with the same identifier (taking into account the altloc identifier)&lt;br /&gt;
These errors indicate real problems in the PDB file (for details see&lt;br /&gt;
the Bioinformatics article). In the restrictive state, PDB files with&lt;br /&gt;
errors cause an exception to occur. This is useful to find errors&lt;br /&gt;
in PDB files.&lt;br /&gt;
&lt;br /&gt;
Some errors however are automatically corrected. Normally each disordered&lt;br /&gt;
atom should have a non-blank altloc identifier. However, there are&lt;br /&gt;
many structures that do not follow this convention, and have a blank&lt;br /&gt;
and a non-blank identifier for two disordered positions of the same&lt;br /&gt;
atom. This is automatically interpreted in the right way.&lt;br /&gt;
&lt;br /&gt;
Sometimes a structure contains a list of residues belonging to chain&lt;br /&gt;
A, followed by residues belonging to chain B, and again followed by&lt;br /&gt;
residues belonging to chain A, i.e. the chains are 'broken'. This&lt;br /&gt;
is also correctly interpreted.&lt;br /&gt;
&lt;br /&gt;
==== Can I write PDB files? ====&lt;br /&gt;
&lt;br /&gt;
Use the PDBIO class for this. It's easy to write out specific parts&lt;br /&gt;
of a structure too, of course.&lt;br /&gt;
&lt;br /&gt;
Example: saving a structure&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
io = PDBIO()&lt;br /&gt;
io.set_structure(s)&lt;br /&gt;
io.save('out.pdb')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
If you want to write out a part of the structure, make use of the&lt;br /&gt;
&amp;lt;code&amp;gt;Select&amp;lt;/code&amp;gt; class (also in &amp;lt;code&amp;gt;PDBIO&amp;lt;/code&amp;gt;). Select has four methods:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
accept_model(model)&lt;br /&gt;
accept_chain(chain)&lt;br /&gt;
accept_residue(residue)&lt;br /&gt;
accept_atom(atom)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
By default, every method returns 1 (which means the model/chain/residue/atom&lt;br /&gt;
is included in the output). By subclassing &amp;lt;code&amp;gt;Select&amp;lt;/code&amp;gt; and returning&lt;br /&gt;
0 when appropriate you can exclude models, chains, etc. from the output.&lt;br /&gt;
Cumbersome maybe, but very powerful. The following code only writes&lt;br /&gt;
out glycine residues:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
class GlySelect(Select):&lt;br /&gt;
    def accept_residue(self, residue):&lt;br /&gt;
        if residue.get_name()=='GLY':&lt;br /&gt;
            return 1&lt;br /&gt;
        else:&lt;br /&gt;
            return 0&lt;br /&gt;
&lt;br /&gt;
io = PDBIO()&lt;br /&gt;
io.set_structure(s)&lt;br /&gt;
io.save('gly_only.pdb', GlySelect())&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
If this is all too complicated for you, the &amp;lt;code&amp;gt;Dice&amp;lt;/code&amp;gt; module contains&lt;br /&gt;
a handy &amp;lt;code&amp;gt;extract&amp;lt;/code&amp;gt; function that writes out all residues in&lt;br /&gt;
a chain between a start and end residue.&lt;br /&gt;
&lt;br /&gt;
==== Can I write mmCIF files? ====&lt;br /&gt;
&lt;br /&gt;
No, and I also don't have plans to add that functionality soon (or&lt;br /&gt;
ever - I don't need it at all, and it's a lot of work, plus no-one&lt;br /&gt;
has ever asked for it). People who want to add this can contact me.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== The Structure object ===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==== What's the overall layout of a Structure object? ====&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object follows the so-called '''SMCRA'''&lt;br /&gt;
(Structure/Model/Chain/Residue/Atom) architecture : &lt;br /&gt;
* A structure consists of models&lt;br /&gt;
* A model consists of chains&lt;br /&gt;
* A chain consists of residues&lt;br /&gt;
* A residue consists of atoms&lt;br /&gt;
This is the way many structural biologists/bioinformaticians think&lt;br /&gt;
about structure, and provides a simple but efficient way to deal with&lt;br /&gt;
structure. Additional stuff is essentially added when needed. A UML&lt;br /&gt;
diagram of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object (forgetting about the &amp;lt;code&amp;gt;Disordered&amp;lt;/code&amp;gt;&lt;br /&gt;
classes for now) is shown in the figure below.&lt;br /&gt;
&lt;br /&gt;
[[Image:Smcra.png|600px|left|frame|Diagram of SMCRA architecture of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object. Full lines with diamonds denote aggregation, full lines with arrows denote referencing, full lines with triangles denote inheritance and dashed lines with triangles denote interface realization.]]&lt;br /&gt;
&lt;br /&gt;
&amp;lt;br clear=all&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I navigate through a Structure object? ====&lt;br /&gt;
&lt;br /&gt;
The following code iterates through all atoms of a structure:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
p = PDBParser()&lt;br /&gt;
structure = p.get_structure('X', 'pdb1fat.ent')&lt;br /&gt;
for model in structure:&lt;br /&gt;
    for chain in model:&lt;br /&gt;
        for residue in chain:&lt;br /&gt;
            for atom in residue:&lt;br /&gt;
                print atom&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
There are also some shortcuts:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Iterate over all atoms in a structure&lt;br /&gt;
for atom in structure.get_atoms():&lt;br /&gt;
    print atom&lt;br /&gt;
&lt;br /&gt;
# Iterate over all residues in a model&lt;br /&gt;
for residue in model.get_residues():&lt;br /&gt;
    print residue&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Structures, models, chains, residues and atoms are called '''Entities'''&lt;br /&gt;
in Biopython. You can always get a parent &amp;lt;code&amp;gt;Entity&amp;lt;/code&amp;gt; from a child&lt;br /&gt;
&amp;lt;code&amp;gt;Entity&amp;lt;/code&amp;gt;, e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
residue = atom.get_parent()&lt;br /&gt;
chain = residue.get_parent()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
You can also test whether an &amp;lt;code&amp;gt;Entity&amp;lt;/code&amp;gt; has a certain child using&lt;br /&gt;
the &amp;lt;code&amp;gt;has_id&amp;lt;/code&amp;gt; method.&lt;br /&gt;
&lt;br /&gt;
==== Can I do that a bit more conveniently? ====&lt;br /&gt;
&lt;br /&gt;
You can do things like:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
atoms = structure.get_atoms()&lt;br /&gt;
residues = structure.get_residues()&lt;br /&gt;
atoms = chain.get_atoms()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
You can also use the &amp;lt;code&amp;gt;Selection.unfold_entities&amp;lt;/code&amp;gt; function:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Get all residues from a structure&lt;br /&gt;
res_list = Selection.unfold_entities(structure, 'R')&lt;br /&gt;
# Get all atoms from a chain&lt;br /&gt;
atom_list = Selection.unfold_entities(chain, 'A')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Obviously, &amp;lt;code&amp;gt;A&amp;lt;/code&amp;gt;=atom, &amp;lt;code&amp;gt;R&amp;lt;/code&amp;gt;=residue, &amp;lt;code&amp;gt;C&amp;lt;/code&amp;gt;=chain, &amp;lt;code&amp;gt;M&amp;lt;/code&amp;gt;=model, &amp;lt;code&amp;gt;S&amp;lt;/code&amp;gt;=structure.&lt;br /&gt;
You can use this to go up in the hierarchy, e.g. to get a list of (unique) &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;Chain&amp;lt;/code&amp;gt; parents from a list of&lt;br /&gt;
&amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;s:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
residue_list = Selection.unfold_entities(atom_list, 'R')&lt;br /&gt;
chain_list = Selection.unfold_entities(atom_list, 'C')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
For more info, see the API documentation.&lt;br /&gt;
&lt;br /&gt;
==== How do I extract a specific &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;Chain&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;Model&amp;lt;/code&amp;gt; from a &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt;? ====&lt;br /&gt;
&lt;br /&gt;
Easy. Here are some examples:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
model = structure[0]&lt;br /&gt;
chain = model['A']&lt;br /&gt;
residue = chain[100]&lt;br /&gt;
atom = residue['CA']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Note that you can use a shortcut:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
atom = structure[0]['A'][100]['CA']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== What is a model id? ====&lt;br /&gt;
&lt;br /&gt;
The model id is an integer which denotes the rank of the model in&lt;br /&gt;
the PDB/mmCIF file. The model is starts at 0. Crystal structures generally&lt;br /&gt;
have only one model (with id 0), while NMR files usually have several&lt;br /&gt;
models.&lt;br /&gt;
&lt;br /&gt;
==== What is a chain id? ====&lt;br /&gt;
&lt;br /&gt;
The chain id is specified in the PDB/mmCIF file, and is a single character&lt;br /&gt;
(typically a letter). &lt;br /&gt;
&lt;br /&gt;
==== What is a residue id? ====&lt;br /&gt;
&lt;br /&gt;
This is a bit more complicated, due to the clumsy PDB format. A residue&lt;br /&gt;
id is a tuple with three elements:&lt;br /&gt;
* The '''hetero-flag''': this is &amp;lt;code&amp;gt;'H_'&amp;lt;/code&amp;gt; plus the name of the hetero-residue (e.g. &amp;lt;code&amp;gt;'H_GLC'&amp;lt;/code&amp;gt; in the case of a glucose molecule), or &amp;lt;code&amp;gt;'W'&amp;lt;/code&amp;gt; in the case of a water molecule.&lt;br /&gt;
* The '''sequence identifier''' in the chain, e.g. 100&lt;br /&gt;
* The '''insertion code''', e.g. &amp;lt;code&amp;gt;'A'&amp;lt;/code&amp;gt;. The insertion code is sometimes used to preserve a certain desirable residue numbering scheme. A Ser 80 insertion mutant (inserted e.g. between a Thr 80 and an Asn 81 residue) could e.g. have sequence identifiers and insertion codes as follows: Thr 80 A, Ser 80 B, Asn 81. In this way the residue numbering scheme stays in tune with that of the wild type structure.&lt;br /&gt;
&lt;br /&gt;
The id of the above glucose residue would thus be &amp;lt;code&amp;gt;('H_GLC', 100, 'A')&amp;lt;/code&amp;gt;. If the hetero-flag and insertion code are blank, the sequence&lt;br /&gt;
identifier alone can be used:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Full id&lt;br /&gt;
residue = chain[(' ', 100, ' ')]&lt;br /&gt;
# Shortcut id&lt;br /&gt;
residue = chain[100]&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
The reason for the hetero-flag is that many, many PDB files use the same sequence identifier for an amino acid and a hetero-residue or a water, which would create obvious problems if the hetero-flag was not used.&lt;br /&gt;
&lt;br /&gt;
==== What is an atom id? ====&lt;br /&gt;
&lt;br /&gt;
The atom id is simply the atom name (eg. &amp;lt;code&amp;gt;'CA'&amp;lt;/code&amp;gt;). In practice,&lt;br /&gt;
the atom name is created by stripping all spaces from the atom name&lt;br /&gt;
in the PDB file. &lt;br /&gt;
&lt;br /&gt;
However, in PDB files, a space can be part of an atom name. Often,&lt;br /&gt;
calcium atoms are called &amp;lt;code&amp;gt;'CA..'&amp;lt;/code&amp;gt; in order to distinguish them&lt;br /&gt;
from C&amp;amp;alpha; atoms (which are called &amp;lt;code&amp;gt;'.CA.'&amp;lt;/code&amp;gt;). In cases&lt;br /&gt;
were stripping the spaces would create problems (ie. two atoms called&lt;br /&gt;
&amp;lt;code&amp;gt;'CA'&amp;lt;/code&amp;gt; in the same residue) the spaces are kept.&lt;br /&gt;
&lt;br /&gt;
==== How is disorder handled? ====&lt;br /&gt;
&lt;br /&gt;
This is one of the strong points of Bio.PDB. It can handle both disordered&lt;br /&gt;
atoms and point mutations (ie. a Gly and an Ala residue in the same&lt;br /&gt;
position). &lt;br /&gt;
&lt;br /&gt;
Disorder should be dealt with from two points of view: the atom and&lt;br /&gt;
the residue points of view. In general, I have tried to encapsulate&lt;br /&gt;
all the complexity that arises from disorder. If you just want to&lt;br /&gt;
loop over all C&amp;amp;alpha; atoms, you do not care that some residues&lt;br /&gt;
have a disordered side chain. On the other hand it should also be&lt;br /&gt;
possible to represent disorder completely in the data structure. Therefore,&lt;br /&gt;
disordered atoms or residues are stored in special objects that behave&lt;br /&gt;
as if there is no disorder. This is done by only representing a subset&lt;br /&gt;
of the disordered atoms or residues. Which subset is picked (e.g.&lt;br /&gt;
which of the two disordered OG side chain atom positions of a Ser&lt;br /&gt;
residue is used) can be specified by the user.&lt;br /&gt;
&lt;br /&gt;
'''Disordered atom positions''' are represented by ordinary &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;&lt;br /&gt;
objects, but all &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects that represent the same physical&lt;br /&gt;
atom are stored in a &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; object (see Fig. \ref{cap:SMCRA).&lt;br /&gt;
Each &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object in a &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; object can&lt;br /&gt;
be uniquely indexed using its altloc specifier. The &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt;&lt;br /&gt;
object forwards all uncaught method calls to the selected Atom object,&lt;br /&gt;
by default the one that represents the atom with the highest&lt;br /&gt;
occupancy. The user can of course change the selected &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;&lt;br /&gt;
object, making use of its altloc specifier. In this way atom disorder&lt;br /&gt;
is represented correctly without much additional complexity. In other&lt;br /&gt;
words, if you are not interested in atom disorder, you will not be&lt;br /&gt;
bothered by it.&lt;br /&gt;
&lt;br /&gt;
Each disordered atom has a characteristic altloc identifier. You can&lt;br /&gt;
specify that a &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; object should behave like&lt;br /&gt;
the &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object associated with a specific altloc identifier:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
atom.disordered_select('A')  # select altloc A atom&lt;br /&gt;
atom.disordered_select('B')  # select altloc B atom &lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
A special case arises when disorder is due to '''point mutations''',&lt;br /&gt;
i.e. when two or more point mutants of a polypeptide are present in&lt;br /&gt;
the crystal. An example of this can be found in PDB structure 1EN2.&lt;br /&gt;
&lt;br /&gt;
Since these residues belong to a different residue type (e.g. let's&lt;br /&gt;
say Ser 60 and Cys 60) they should not be stored in a single &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt;&lt;br /&gt;
object as in the common case. In this case, each residue is represented&lt;br /&gt;
by one &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object, and both &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; objects&lt;br /&gt;
are stored in a single &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt; object (see Fig.&lt;br /&gt;
\ref{cap:SMCRA).&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt; object forwards all uncaught&lt;br /&gt;
methods to the selected &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object (by default the last&lt;br /&gt;
&amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object added), and thus behaves like an ordinary&lt;br /&gt;
residue. Each &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object in a &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt;&lt;br /&gt;
object can be uniquely identified by its residue name. In the above&lt;br /&gt;
example, residue Ser 60 would have id 'SER' in the &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt;&lt;br /&gt;
object, while residue Cys 60 would have id 'CYS'. The user can select&lt;br /&gt;
the active &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object in a &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt;&lt;br /&gt;
object via this id.&lt;br /&gt;
&lt;br /&gt;
Example: suppose that a chain has a point mutation at position 10,&lt;br /&gt;
consisting of a Ser and a Cys residue. Make sure that residue 10 of&lt;br /&gt;
this chain behaves as the Cys residue.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
residue = chain[10]&lt;br /&gt;
residue.disordered_select('CYS')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
In addition, you can get a list of all &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects (ie.&lt;br /&gt;
all &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; objects are 'unpacked' to their individual&lt;br /&gt;
&amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects) using the &amp;lt;code&amp;gt;get_unpacked_list&amp;lt;/code&amp;gt; method&lt;br /&gt;
of a (&amp;lt;code&amp;gt;Disordered&amp;lt;/code&amp;gt;)&amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object.&lt;br /&gt;
&lt;br /&gt;
==== Can I sort residues in a chain somehow? ====&lt;br /&gt;
&lt;br /&gt;
Yes, kinda, but I'm waiting for a request for this feature to finish&lt;br /&gt;
it :-).&lt;br /&gt;
&lt;br /&gt;
==== How are ligands and solvent handled? ====&lt;br /&gt;
&lt;br /&gt;
See 'What is a residue id?'.&lt;br /&gt;
&lt;br /&gt;
==== What about B factors? ====&lt;br /&gt;
&lt;br /&gt;
Well, yes! Bio.PDB supports isotropic and anisotropic B factors, and&lt;br /&gt;
also deals with standard deviations of anisotropic B factor if present&lt;br /&gt;
(see the section [[#Analysis]]).&lt;br /&gt;
&lt;br /&gt;
==== What about standard deviation of atomic positions? ====&lt;br /&gt;
&lt;br /&gt;
Yup, supported. See the section [[#Analysis]].&lt;br /&gt;
&lt;br /&gt;
==== I think the SMCRA data structure is not flexible/sexy/whatever enough... ====&lt;br /&gt;
&lt;br /&gt;
Sure, sure. Everybody is always coming up with (mostly vaporware or&lt;br /&gt;
partly implemented) data structures that handle all possible situations&lt;br /&gt;
and are extensible in all thinkable (and unthinkable) ways. The prosaic&lt;br /&gt;
truth however is that 99.9\% of people using (and I mean really using!)&lt;br /&gt;
crystal structures think in terms of models, chains, residues and&lt;br /&gt;
atoms. The philosophy of Bio.PDB is to provide a reasonably fast,&lt;br /&gt;
clean, simple, but complete data structure to access structure data.&lt;br /&gt;
The proof of the pudding is in the eating.&lt;br /&gt;
&lt;br /&gt;
Moreover, it is quite easy to build more specialised data structures&lt;br /&gt;
on top of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; class (eg. there's a &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt;&lt;br /&gt;
class). On the other hand, the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object is built&lt;br /&gt;
using a Parser/Consumer approach (called &amp;lt;code&amp;gt;PDBParser&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;MMCIFParser&amp;lt;/code&amp;gt;&lt;br /&gt;
and &amp;lt;code&amp;gt;StructureBuilder&amp;lt;/code&amp;gt;, respectively). One can easily reuse&lt;br /&gt;
the PDB/mmCIF parsers by implementing a specialised &amp;lt;code&amp;gt;StructureBuilder&amp;lt;/code&amp;gt;&lt;br /&gt;
class. It is of course also trivial to add support for new file formats&lt;br /&gt;
by writing new parsers.&lt;br /&gt;
&lt;br /&gt;
=== Analysis ===&lt;br /&gt;
&lt;br /&gt;
==== How do I extract information from an &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object? ====&lt;br /&gt;
&lt;br /&gt;
Using the following methods:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
a.get_name()           # atom name (spaces stripped, e.g. 'CA')&lt;br /&gt;
a.get_id()             # id (equals atom name)&lt;br /&gt;
a.get_coord()          # atomic coordinates&lt;br /&gt;
a.get_vector()         # atomic coordinates as Vector object&lt;br /&gt;
a.get_bfactor()        # isotropic B factor&lt;br /&gt;
a.get_occupancy()      # occupancy&lt;br /&gt;
a.get_altloc()         # alternative location specifier&lt;br /&gt;
a.get_sigatm()         # std. dev. of atomic parameters&lt;br /&gt;
a.get_siguij()         # std. dev. of anisotropic B factor&lt;br /&gt;
a.get_anisou()         # anisotropic B factor&lt;br /&gt;
a.get_fullname()       # atom name (with spaces, e.g. '.CA.')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I extract information from a &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object? ====&lt;br /&gt;
&lt;br /&gt;
Using the following methods:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
r.get_resname()         # return the residue name (eg. 'GLY')&lt;br /&gt;
r.is_disordered()       # 1 if the residue has disordered atoms&lt;br /&gt;
r.get_segid()           # return the SEGID&lt;br /&gt;
r.has_id(name)          # test if a residue has a certain atom&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I measure distances? ====&lt;br /&gt;
&lt;br /&gt;
That's simple: the minus operator for atoms has been overloaded to&lt;br /&gt;
return the distance between two atoms. &lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Get some atoms&lt;br /&gt;
ca1 = residue1['CA']&lt;br /&gt;
ca2 = residue2['CA']&lt;br /&gt;
# Simply subtract the atoms to get their distance&lt;br /&gt;
distance = ca1-ca2&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I measure angles? ====&lt;br /&gt;
&lt;br /&gt;
This can easily be done via the vector representation of the atomic coordinates, and the &amp;lt;code&amp;gt;calc_angle&amp;lt;/code&amp;gt; function from the &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; module:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
vector1 = atom1.get_vector()&lt;br /&gt;
vector2 = atom2.get_vector()&lt;br /&gt;
vector3 = atom3.get_vector()&lt;br /&gt;
angle = calc_angle(vector1, vector2, vector3)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I measure torsion angles? ====&lt;br /&gt;
&lt;br /&gt;
Again, this can easily be done via the vector representation of the&lt;br /&gt;
atomic coordinates, this time using the &amp;lt;code&amp;gt;calc_dihedral&amp;lt;/code&amp;gt; function&lt;br /&gt;
from the &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; module:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
vector1 = atom1.get_vector()&lt;br /&gt;
vector2 = atom2.get_vector()&lt;br /&gt;
vector3 = atom3.get_vector()&lt;br /&gt;
vector4 = atom4.get_vector()&lt;br /&gt;
angle = calc_dihedral(vector1, vector2, vector3, vector4)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I determine atom-atom contacts? ====&lt;br /&gt;
&lt;br /&gt;
Use &amp;lt;code&amp;gt;NeighborSearch&amp;lt;/code&amp;gt;. This uses a KD tree data structure coded&lt;br /&gt;
in C behind the screens, so it's pretty darn fast (see &amp;lt;code&amp;gt;Bio.KDTree&amp;lt;/code&amp;gt;).&lt;br /&gt;
&lt;br /&gt;
==== How do I extract polypeptides from a &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object? ====&lt;br /&gt;
&lt;br /&gt;
Use &amp;lt;code&amp;gt;PolypeptideBuilder&amp;lt;/code&amp;gt;. You can use the resulting &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt;&lt;br /&gt;
object to get the sequence as a &amp;lt;code&amp;gt;Seq&amp;lt;/code&amp;gt; object or to get a list&lt;br /&gt;
of C&amp;amp;alpha; atoms as well. Polypeptides can be built using a C-N&lt;br /&gt;
or a C&amp;amp;alpha;-C&amp;amp;alpha; distance criterion.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Using C-N &lt;br /&gt;
ppb=PPBuilder()&lt;br /&gt;
for pp in ppb.build_peptides(structure): &lt;br /&gt;
    print pp.get_sequence()&lt;br /&gt;
&lt;br /&gt;
# Using CA-CA&lt;br /&gt;
ppb=CaPPBuilder()&lt;br /&gt;
for pp in ppb.build_peptides(structure): &lt;br /&gt;
    print pp.get_sequence()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that in the above case only model 0 of the structure is considered&lt;br /&gt;
by &amp;lt;code&amp;gt;PolypeptideBuilder&amp;lt;/code&amp;gt;. However, it is possible to use &amp;lt;code&amp;gt;PolypeptideBuilder&amp;lt;/code&amp;gt;&lt;br /&gt;
to build &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt; objects from &amp;lt;code&amp;gt;Model&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;Chain&amp;lt;/code&amp;gt;&lt;br /&gt;
objects as well.&lt;br /&gt;
&lt;br /&gt;
==== How do I get the sequence of a structure? ====&lt;br /&gt;
&lt;br /&gt;
The first thing to do is to extract all polypeptides from the structure&lt;br /&gt;
(see previous entry). The sequence of each polypeptide can then easily&lt;br /&gt;
be obtained from the &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt; objects. The sequence is&lt;br /&gt;
represented as a Biopython &amp;lt;code&amp;gt;Seq&amp;lt;/code&amp;gt; object, and its alphabet is&lt;br /&gt;
defined by a &amp;lt;code&amp;gt;ProteinAlphabet&amp;lt;/code&amp;gt; object.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; seq = polypeptide.get_sequence()&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print seq&lt;br /&gt;
Seq('SNVVE...', &amp;lt;class Bio.Alphabet.ProteinAlphabet&amp;gt;)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I determine secondary structure? ====&lt;br /&gt;
&lt;br /&gt;
For this functionality, you need to install DSSP (and obtain a license&lt;br /&gt;
for it - free for academic use, see http://www.cmbi.kun.nl/gv/dssp/).&lt;br /&gt;
Then use the &amp;lt;code&amp;gt;DSSP&amp;lt;/code&amp;gt; class, which maps &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; objects&lt;br /&gt;
to their secondary structure (and accessible surface area). The DSSP&lt;br /&gt;
codes are listed in Table \ref{cap:DSSP-codes. Note that DSSP (the&lt;br /&gt;
program, and thus by consequence the class) cannot handle multiple&lt;br /&gt;
models!&lt;br /&gt;
&lt;br /&gt;
%&lt;br /&gt;
\begin{table&lt;br /&gt;
&lt;br /&gt;
==== \begin{tabular{|c|c|&lt;br /&gt;
\hline &lt;br /&gt;
Code&amp;amp;&lt;br /&gt;
Secondary structure\tabularnewline&lt;br /&gt;
\hline&lt;br /&gt;
\hline &lt;br /&gt;
H&amp;amp;&lt;br /&gt;
&amp;amp;alpha;-helix\tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
B&amp;amp;&lt;br /&gt;
Isolated &amp;amp;beta;-bridge residue\tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
E&amp;amp;&lt;br /&gt;
Strand \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
G&amp;amp;&lt;br /&gt;
3-10 helix \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
I&amp;amp;&lt;br /&gt;
$\Pi$-helix \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
T&amp;amp;&lt;br /&gt;
Turn\tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
S&amp;amp;&lt;br /&gt;
Bend \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
-&amp;amp;&lt;br /&gt;
Other\tabularnewline&lt;br /&gt;
\hline&lt;br /&gt;
\end{tabular&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
\caption{\label{cap:DSSP-codesDSSP codes in Bio.PDB.&lt;br /&gt;
\end{table&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==== How do I calculate the accessible surface area of a residue? ====&lt;br /&gt;
&lt;br /&gt;
Use the &amp;lt;code&amp;gt;DSSP&amp;lt;/code&amp;gt; class (see also previous entry). But see also&lt;br /&gt;
next entry.&lt;br /&gt;
&lt;br /&gt;
==== How do I calculate residue depth? ====&lt;br /&gt;
&lt;br /&gt;
Residue depth is the average distance of a residue's atoms from the&lt;br /&gt;
solvent accessible surface. It's a fairly new and very powerful parameterization&lt;br /&gt;
of solvent accessibility. For this functionality, you need to install&lt;br /&gt;
Michel Sanner's MSMS program (http://www.scripps.edu/pub/olson-web/people/sanner/html/msms_home.html).&lt;br /&gt;
Then use the &amp;lt;code&amp;gt;ResidueDepth&amp;lt;/code&amp;gt; class. This class behaves as a&lt;br /&gt;
dictionary which maps &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; objects to corresponding (residue&lt;br /&gt;
depth, C&amp;amp;alpha; depth) tuples. The C&amp;amp;alpha; depth is the distance&lt;br /&gt;
of a residue's C&amp;amp;alpha; atom to the solvent accessible surface. &lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
model = structure[0]&lt;br /&gt;
rd = ResidueDepth(model, pdb_file)&lt;br /&gt;
residue_depth, ca_depth = rd[some_residue]&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
You can also get access to the molecular surface itself (via the &amp;lt;code&amp;gt;get_surface&amp;lt;/code&amp;gt;&lt;br /&gt;
function), in the form of a Numeric python array with the surface points.&lt;br /&gt;
&lt;br /&gt;
==== How do I calculate Half Sphere Exposure? ====&lt;br /&gt;
&lt;br /&gt;
Half Sphere Exposure (HSE) is a new, 2D measure of solvent exposure.&lt;br /&gt;
Basically, it counts the number of C&amp;amp;alpha; atoms around a residue&lt;br /&gt;
in the direction of its side chain, and in the opposite direction&lt;br /&gt;
(within a radius of 13 Å). Despite its simplicity, it outperforms&lt;br /&gt;
many other measures of solvent exposure. An article describing this&lt;br /&gt;
novel 2D measure has been submitted.&lt;br /&gt;
&lt;br /&gt;
HSE comes in two flavors: HSE&amp;amp;alpha; and HSE&amp;amp;beta;. The former&lt;br /&gt;
only uses the C&amp;amp;alpha; atom positions, while the latter uses the&lt;br /&gt;
C&amp;amp;alpha; and C&amp;amp;beta; atom positions. The HSE measure is calculated&lt;br /&gt;
by the &amp;lt;code&amp;gt;HSExposure&amp;lt;/code&amp;gt; class, which can also calculate the contact&lt;br /&gt;
number. The latter class has methods which return dictionaries that&lt;br /&gt;
map a &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object to its corresponding HSE&amp;amp;alpha;, HSE&amp;amp;beta;&lt;br /&gt;
and contact number values.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
model = structure[0]&lt;br /&gt;
hse = HSExposure()&lt;br /&gt;
# Calculate HSEalpha&lt;br /&gt;
exp_ca = hse.calc_hs_exposure(model, option='CA3')&lt;br /&gt;
# Calculate HSEbeta&lt;br /&gt;
exp_cb = hse.calc_hs_exposure(model, option='CB')&lt;br /&gt;
# Calculate classical coordination number&lt;br /&gt;
exp_fs = hse.calc_fs_exposure(model)&lt;br /&gt;
# Print HSEalpha for a residue&lt;br /&gt;
print exp_ca[some_residue]&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I map the residues of two related structures onto each other? ====&lt;br /&gt;
&lt;br /&gt;
First, create an alignment file in FASTA format, then use the &amp;lt;code&amp;gt;StructureAlignment&amp;lt;/code&amp;gt;&lt;br /&gt;
class. This class can also be used for alignments with more than two&lt;br /&gt;
structures.&lt;br /&gt;
&lt;br /&gt;
==== How do I test if a Residue object is an amino acid? ====&lt;br /&gt;
&lt;br /&gt;
Use &amp;lt;code&amp;gt;is_aa(residue)&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
==== Can I do vector operations on atomic coordinates? ====&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects return a &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; object representation&lt;br /&gt;
of the coordinates with the &amp;lt;code&amp;gt;get_vector&amp;lt;/code&amp;gt; method. &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt;&lt;br /&gt;
implements the full set of 3D vector operations, matrix multiplication&lt;br /&gt;
(left and right) and some advanced rotation-related operations as&lt;br /&gt;
well. See also next question.&lt;br /&gt;
&lt;br /&gt;
==== How do I put a virtual C&amp;amp;beta; on a Gly residue? ====&lt;br /&gt;
&lt;br /&gt;
OK, I admit, this example is only present to show off the possibilities&lt;br /&gt;
of Bio.PDB's &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; module (though this code is actually&lt;br /&gt;
used in the &amp;lt;code&amp;gt;HSExposure&amp;lt;/code&amp;gt; module, which contains a novel way&lt;br /&gt;
to parametrize residue exposure - publication underway). Suppose that&lt;br /&gt;
you would like to find the position of a Gly residue's C&amp;amp;beta; atom,&lt;br /&gt;
if it had one. How would you do that? Well, rotating the N atom of&lt;br /&gt;
the Gly residue along the C&amp;amp;alpha;-C bond over -120 degrees roughly&lt;br /&gt;
puts it in the position of a virtual C&amp;amp;beta; atom. Here's how to&lt;br /&gt;
do it, making use of the &amp;lt;code&amp;gt;rotaxis&amp;lt;/code&amp;gt; method (which can be used&lt;br /&gt;
to construct a rotation around a certain axis) of the &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt;&lt;br /&gt;
module:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# get atom coordinates as vectors&lt;br /&gt;
n = residue['N'].get_vector() &lt;br /&gt;
c = residue['C'].get_vector() &lt;br /&gt;
ca = residue['CA'].get_vector()&lt;br /&gt;
# center at origin&lt;br /&gt;
n = n - ca&lt;br /&gt;
c = c - ca&lt;br /&gt;
# find rotation matrix that rotates n -120 degrees along the ca-c vector&lt;br /&gt;
rot = rotaxis(-pi{*120.0/180.0, c)&lt;br /&gt;
# apply rotation to ca-n vector&lt;br /&gt;
cb_at_origin = n.left_multiply(rot)&lt;br /&gt;
# put on top of ca atom&lt;br /&gt;
cb = cb_at_origin + ca&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
This example shows that it's possible to do some quite nontrivial&lt;br /&gt;
vector operations on atomic data, which can be quite useful. In addition&lt;br /&gt;
to all the usual vector operations (cross (use &amp;lt;code&amp;gt;**&amp;lt;/code&amp;gt;), and&lt;br /&gt;
dot (use &amp;lt;code&amp;gt;*&amp;lt;/code&amp;gt;) product, angle, norm, etc.) and the above mentioned&lt;br /&gt;
&amp;lt;code&amp;gt;rotaxis function, the &amp;lt;code&amp;gt;Vector module also has methods&lt;br /&gt;
to rotate (&amp;lt;code&amp;gt;rotmat&amp;lt;/code&amp;gt;) or reflect (&amp;lt;code&amp;gt;refmat&amp;lt;/code&amp;gt;) one vector&lt;br /&gt;
on top of another.&lt;br /&gt;
&lt;br /&gt;
=== Manipulating the structure ===&lt;br /&gt;
&lt;br /&gt;
==== How do I superimpose two structures? ====&lt;br /&gt;
&lt;br /&gt;
Surprisingly, this is done using the &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; object.&lt;br /&gt;
This object calculates the rotation and translation matrix that rotates&lt;br /&gt;
two lists of atoms on top of each other in such a way that their RMSD&lt;br /&gt;
is minimized. Of course, the two lists need to contain the same amount&lt;br /&gt;
of atoms. The &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; object can also apply the rotation/translation&lt;br /&gt;
to a list of atoms. The rotation and translation are stored as a tuple&lt;br /&gt;
in the &amp;lt;code&amp;gt;rotran&amp;lt;/code&amp;gt; attribute of the &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; object&lt;br /&gt;
(note that the rotation is right multiplying!). The RMSD is stored&lt;br /&gt;
in the &amp;lt;code&amp;gt;rmsd&amp;lt;/code&amp;gt; attribute.&lt;br /&gt;
&lt;br /&gt;
The algorithm used by &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; comes from \textit{Matrix&lt;br /&gt;
computations, 2nd ed. Golub, G. \&amp;amp; Van Loan (1989) and makes use&lt;br /&gt;
of singular value decomposition (this is implemented in the general&lt;br /&gt;
&amp;lt;code&amp;gt;Bio.SVDSuperimposer&amp;lt;/code&amp;gt; module).&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
sup = Superimposer()&lt;br /&gt;
# Specify the atom lists&lt;br /&gt;
# 'fixed' and 'moving' are lists of Atom objects&lt;br /&gt;
# The moving atoms will be put on the fixed atoms&lt;br /&gt;
sup.set_atoms(fixed, moving)&lt;br /&gt;
# Print rotation/translation/rmsd&lt;br /&gt;
print sup.rotran&lt;br /&gt;
print sup.rms &lt;br /&gt;
# Apply rotation/translation to the moving atoms&lt;br /&gt;
sup.apply(moving)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I superimpose two structures based on their active sites? ====&lt;br /&gt;
&lt;br /&gt;
Pretty easily. Use the active site atoms to calculate the rotation/translation&lt;br /&gt;
matrices (see above), and apply these to the whole molecule.&lt;br /&gt;
&lt;br /&gt;
==== Can I manipulate the atomic coordinates? ====&lt;br /&gt;
&lt;br /&gt;
Yes, using the &amp;lt;code&amp;gt;transform&amp;lt;/code&amp;gt; method of the &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object,&lt;br /&gt;
or directly using the &amp;lt;code&amp;gt;set_coord&amp;lt;/code&amp;gt; method.&lt;br /&gt;
&lt;br /&gt;
== Other Structural Bioinformatics modules ==&lt;br /&gt;
&lt;br /&gt;
==== Bio.SCOP ====&lt;br /&gt;
&lt;br /&gt;
See the main Biopython tutorial.&lt;br /&gt;
&lt;br /&gt;
==== Bio.FSSP ====&lt;br /&gt;
&lt;br /&gt;
No documentation available yet.&lt;br /&gt;
&lt;br /&gt;
== You haven't answered my question yet! ==&lt;br /&gt;
&lt;br /&gt;
Woah! It's late and I'm tired, and a glass of excellent ''Pedro Ximenez'' sherry is waiting for me. Just drop me a mail, and I'll answer&lt;br /&gt;
you in the morning (with a bit of luck...).&lt;br /&gt;
&lt;br /&gt;
== Contributors ==&lt;br /&gt;
&lt;br /&gt;
The main author/maintainer of Bio.PDB is:&lt;br /&gt;
&lt;br /&gt;
 Thomas Hamelryck&lt;br /&gt;
 Bioinformatics center&lt;br /&gt;
 Institute of Molecular Biology&lt;br /&gt;
 University of Copenhagen&lt;br /&gt;
 Universitetsparken 15, Bygning 10&lt;br /&gt;
 DK-2100 København Ø&lt;br /&gt;
 Denmark&lt;br /&gt;
 thamelry@binf.ku.dk&lt;br /&gt;
&lt;br /&gt;
Kristian Rother donated code to interact with the PDB database, and to parse the PDB&lt;br /&gt;
header. Indraneel Majumdar sent in some bug reports and assisted in&lt;br /&gt;
coding the &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt; module. Many thanks to Brad Chapman,&lt;br /&gt;
Jeffrey Chang, Andrew Dalke and Iddo Friedberg for suggestions, comments,&lt;br /&gt;
help and/or biting criticism :-).&lt;br /&gt;
&lt;br /&gt;
== Can I contribute? ==&lt;br /&gt;
&lt;br /&gt;
Yes, yes, yes! Just send an e-mail to [mailto:thamelry@binf.ku.dk Thomas Hamelryck] or to [mailto:biopython-dev@biopython.org the Biopython developers] if you have something useful to contribute! Eternal fame awaits!&lt;/div&gt;</summary>
		<author><name>Mdehoon</name></author>	</entry>

	<entry>
		<id>http://www.biopython.org/wiki/The_Biopython_Structural_Bioinformatics_FAQ</id>
		<title>The Biopython Structural Bioinformatics FAQ</title>
		<link rel="alternate" type="text/html" href="http://www.biopython.org/wiki/The_Biopython_Structural_Bioinformatics_FAQ"/>
				<updated>2013-03-26T08:27:04Z</updated>
		
		<summary type="html">&lt;p&gt;Mdehoon: /* What's the overall layout of a Structure object? */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Introduction ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB is a biopython module that focuses on working with crystal&lt;br /&gt;
structures of biological macromolecules. This document gives a fairly&lt;br /&gt;
complete overview of Bio.PDB.&lt;br /&gt;
&lt;br /&gt;
== Bio.PDB's installation ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB is automatically installed as part of Biopython. Biopython&lt;br /&gt;
can be obtained from http://www.biopython.org. It runs on many&lt;br /&gt;
platforms (Linux/Unix, windows, Mac,...).&lt;br /&gt;
&lt;br /&gt;
== Who's using Bio.PDB? ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB was used in the construction of DISEMBL, a web server that&lt;br /&gt;
predicts disordered regions in proteins (http://dis.embl.de/),&lt;br /&gt;
and COLUMBA, a website that provides annotated protein structures&lt;br /&gt;
(http://www.columba-db.de/). Bio.PDB has also been used to&lt;br /&gt;
perform a large scale search for active sites similarities between&lt;br /&gt;
protein structures in the PDB (see [http://dx.doi.org/10.1002/prot.10338 \textit{Proteins Struct. Func.&lt;br /&gt;
Gen., \textbf{2003, 51, 96-108]) , and to develop a new algorithm&lt;br /&gt;
that identifies linear secondary structure elements ([\emph{BMC Bioinformatics,&lt;br /&gt;
\textbf{2005, 6, 202 http://www.biomedcentral.com/1471-2105/6/202]).&lt;br /&gt;
&lt;br /&gt;
Judging from requests for features and information, Bio.PDB is also&lt;br /&gt;
used by several LPCs (Large Pharmaceutical Companies :-).&lt;br /&gt;
&lt;br /&gt;
== Is there a Bio.PDB reference? ==&lt;br /&gt;
&lt;br /&gt;
Yes, and I'd appreciate it if you would refer to Bio.PDB in publications&lt;br /&gt;
if you make use of it. The reference is:&lt;br /&gt;
&lt;br /&gt;
\begin{quote&lt;br /&gt;
Hamelryck, T., Manderick, B. (2003) PDB parser and structure class&lt;br /&gt;
implemented in Python. \textit{Bioinformatics, \textbf{19, 2308-2310. &lt;br /&gt;
\end{quote&lt;br /&gt;
The article can be freely downloaded via the Bioinformatics journal&lt;br /&gt;
website (http://www.binf.ku.dk/users/thamelry/references.html).&lt;br /&gt;
I welcome e-mails telling me what you are using Bio.PDB for. Feature&lt;br /&gt;
requests are welcome too.&lt;br /&gt;
&lt;br /&gt;
== How well tested is Bio.PDB? ==&lt;br /&gt;
&lt;br /&gt;
Pretty well, actually. Bio.PDB has been extensively tested on nearly&lt;br /&gt;
5500 structures from the PDB - all structures seemed to be parsed&lt;br /&gt;
correctly. More details can be found in the Bio.PDB Bioinformatics&lt;br /&gt;
article. Bio.PDB has been used/is being used in many research projects&lt;br /&gt;
as a reliable tool. In fact, I'm using Bio.PDB almost daily for research&lt;br /&gt;
purposes and continue working on improving it and adding new features.&lt;br /&gt;
&lt;br /&gt;
== How fast is it? ==&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;PDBParser&amp;lt;/code&amp;gt; performance was tested on about 800 structures&lt;br /&gt;
(each belonging to a unique SCOP superfamily). This takes about 20&lt;br /&gt;
minutes, or on average 1.5 seconds per structure. Parsing the structure&lt;br /&gt;
of the large ribosomal subunit (1FKK), which contains about 64000&lt;br /&gt;
atoms, takes 10 seconds on a 1000 MHz PC. In short: it's more than&lt;br /&gt;
fast enough for many applications.&lt;br /&gt;
&lt;br /&gt;
== Why should I use Bio.PDB? ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB might be exactly what you want, and then again it might not.&lt;br /&gt;
If you are interested in data mining the PDB header, you might want&lt;br /&gt;
to look elsewhere because there is only limited support for this.&lt;br /&gt;
If you look for a powerful, complete data structure to access the&lt;br /&gt;
atomic data Bio.PDB is probably for you. &lt;br /&gt;
&lt;br /&gt;
== Usage ==&lt;br /&gt;
&lt;br /&gt;
=== General questions ===&lt;br /&gt;
&lt;br /&gt;
==== Importing Bio.PDB ====&lt;br /&gt;
&lt;br /&gt;
That's simple:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
from Bio.PDB import *&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Is there support for molecular graphics? ====&lt;br /&gt;
&lt;br /&gt;
Not directly, mostly since there are quite a few Python based/Python&lt;br /&gt;
aware solutions already, that can potentially be used with Bio.PDB.&lt;br /&gt;
My choice is Pymol, BTW (I've used this successfully with Bio.PDB,&lt;br /&gt;
and there will probably be specific PyMol modules in Bio.PDB soon/some&lt;br /&gt;
day). Python based/aware molecular graphics solutions include:&lt;br /&gt;
&lt;br /&gt;
* PyMol: http://pymol.sourceforge.net/&lt;br /&gt;
* Chimera: http://www.cgl.ucsf.edu/chimera/&lt;br /&gt;
* PMV: http://www.scripps.edu/ sanner/python/&lt;br /&gt;
* Coot: http://www.ysbl.york.ac.uk/ emsley/coot/&lt;br /&gt;
* CCP4mg: http://www.ysbl.york.ac.uk/ lizp/molgraphics.html&lt;br /&gt;
* mmLib: http://pymmlib.sourceforge.net/ &lt;br /&gt;
* VMD: http://www.ks.uiuc.edu/Research/vmd/&lt;br /&gt;
* MMTK: http://starship.python.net/crew/hinsen/MMTK/&lt;br /&gt;
&lt;br /&gt;
I'd be crazy to write another molecular graphics application (been&lt;br /&gt;
there - done that, actually :-).&lt;br /&gt;
&lt;br /&gt;
=== Input/output ===&lt;br /&gt;
&lt;br /&gt;
==== How do I create a structure object from a PDB file? ====&lt;br /&gt;
&lt;br /&gt;
First, create a &amp;lt;code&amp;gt;PDBParser&amp;lt;/code&amp;gt; object:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
parser = PDBParser()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then, create a structure object from a PDB file in the following way&lt;br /&gt;
(the PDB file in this case is called '1FAT.pdb', 'PHA-L' is a user&lt;br /&gt;
defined name for the structure):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
structure = parser.get_structure('PHA-L', '1FAT.pdb')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I create a structure object from an mmCIF file? ====&lt;br /&gt;
&lt;br /&gt;
Similarly to the case the case of PDB files, first create an &amp;lt;code&amp;gt;MMCIFParser&amp;lt;/code&amp;gt;&lt;br /&gt;
object:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
parser = MMCIFParser()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Then use this parser to create a structure object from the mmCIF file:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
structure = parser.get_structure('PHA-L', '1FAT.cif')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== ...and what about the new PDB XML format? ====&lt;br /&gt;
&lt;br /&gt;
That's not yet supported, but I'm definitely planning to support that&lt;br /&gt;
in the future (it's not a lot of work). Contact me if you need this,&lt;br /&gt;
it might encourage me :-).&lt;br /&gt;
&lt;br /&gt;
==== I'd like to have some more low level access to an mmCIF file... ====&lt;br /&gt;
&lt;br /&gt;
You got it. You can create a python dictionary that maps all mmCIF&lt;br /&gt;
tags in an mmCIF file to their values. If there are multiple values&lt;br /&gt;
(like in the case of tag &amp;lt;code&amp;gt;_atom_site.Cartn_y&amp;lt;/code&amp;gt;, which holds&lt;br /&gt;
the ''y'' coordinates of all atoms), the tag is mapped to a list of values.&lt;br /&gt;
The dictionary is created from the mmCIF file as follows:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
mmcif_dict = MMCIF2Dict('1FAT.cif')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example: get the solvent content from an mmCIF file:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
sc = mmcif_dict['_exptl_crystal.density_percent_sol']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example: get the list of the ''y'' coordinates of all atoms&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
y_list = mmcif_dict['_atom_site.Cartn_y']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Can I access the header information? ====&lt;br /&gt;
&lt;br /&gt;
Thanks to Christian Rother you can access some information from the&lt;br /&gt;
PDB header. Note however that many PDB files contain headers with&lt;br /&gt;
incomplete or erroneous information. Many of the errors have been&lt;br /&gt;
fixed in the equivalent mmCIF files.&lt;br /&gt;
'''Hence, if you are interested in the header information, it is a good idea to extract information from mmCIF files using the &amp;lt;code&amp;gt;MMCIF2Dict&amp;lt;/code&amp;gt; tool described above, instead of parsing the PDB header.''' &lt;br /&gt;
&lt;br /&gt;
Now that is clarified, let's return to parsing the PDB header. The&lt;br /&gt;
structure object has an attribute called &amp;lt;code&amp;gt;header&amp;lt;/code&amp;gt; which is&lt;br /&gt;
a python dictionary that maps header records to their values.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
resolution = structure.header['resolution']&lt;br /&gt;
keywords = structure.header['keywords']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The available keys are &amp;lt;code&amp;gt;name&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;head&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;deposition_date&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;release_date&amp;lt;/code&amp;gt;,&lt;br /&gt;
&amp;lt;code&amp;gt;structure_method&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;resolution&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;structure_reference&amp;lt;/code&amp;gt; (maps to&lt;br /&gt;
a list of references), &amp;lt;code&amp;gt;journal_reference&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;author&amp;lt;/code&amp;gt; and&lt;br /&gt;
&amp;lt;code&amp;gt;compound&amp;lt;/code&amp;gt; (maps to a dictionary with various information about&lt;br /&gt;
the crystallized compound).&lt;br /&gt;
&lt;br /&gt;
The dictionary can also be created without creating a &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt;&lt;br /&gt;
object, ie. directly from the PDB file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
handle = open(filename,'r')&lt;br /&gt;
header_dict = parse_pdb_header(handle)&lt;br /&gt;
handle.close()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Can I use Bio.PDB with NMR structures (ie. with more than one model)? ====&lt;br /&gt;
&lt;br /&gt;
Sure. Many PDB parsers assume that there is only one model, making&lt;br /&gt;
them all but useless for NMR structures. The design of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt;&lt;br /&gt;
object makes it easy to handle PDB files with more than one model&lt;br /&gt;
(see section [[#The Structure object]]). &lt;br /&gt;
&lt;br /&gt;
==== How do I download structures from the PDB? ====&lt;br /&gt;
&lt;br /&gt;
This can be done using the &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object, using the &amp;lt;code&amp;gt;retrieve_pdb_file&amp;lt;/code&amp;gt;&lt;br /&gt;
method. The argument for this method is the PDB identifier of the&lt;br /&gt;
structure.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
pdbl = PDBList()&lt;br /&gt;
pdbl.retrieve_pdb_file('1FAT')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; class can also be used as a command-line tool: &lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
python PDBList.py 1fat&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
The downloaded file will be called &amp;lt;code&amp;gt;pdb1fat.ent&amp;lt;/code&amp;gt; and stored&lt;br /&gt;
in the current working directory. Note that the &amp;lt;code&amp;gt;retrieve_pdb_file&amp;lt;/code&amp;gt;&lt;br /&gt;
method also has an optional argument &amp;lt;code&amp;gt;pdir&amp;lt;/code&amp;gt; that specifies&lt;br /&gt;
a specific directory in which to store the downloaded PDB files. &lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;retrieve_pdb_file&amp;lt;/code&amp;gt; method also has some options for specifying&lt;br /&gt;
the compression format used for the download, and the program used&lt;br /&gt;
for local decompression (default &amp;lt;code&amp;gt;.Z&amp;lt;/code&amp;gt; format and &amp;lt;code&amp;gt;gunzip&amp;lt;/code&amp;gt;).&lt;br /&gt;
In addition, the PDB ftp site can be specified upon creation of the&lt;br /&gt;
&amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object. By default, the server of the Worldwide Protein Data Bank (\url{ftp://ftp.wwpdb.org/pub/pdb/data/structures/divided/pdb/})&lt;br /&gt;
is used. See the API documentation for more details. Thanks again&lt;br /&gt;
to Kristian Rother for donating this module.&lt;br /&gt;
&lt;br /&gt;
==== How do I download the entire PDB? ====&lt;br /&gt;
&lt;br /&gt;
The following commands will store all PDB files in the &amp;lt;code&amp;gt;/data/pdb&amp;lt;/code&amp;gt;&lt;br /&gt;
directory: &lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
python PDBList.py all /data/pdb&lt;br /&gt;
python PDBList.py all /data/pdb -d&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
The API method for this is called &amp;lt;code&amp;gt;download_entire_pdb&amp;lt;/code&amp;gt;.&lt;br /&gt;
Adding the &amp;lt;code&amp;gt;-d&amp;lt;/code&amp;gt; option will store all files in the same directory.&lt;br /&gt;
Otherwise, they are sorted into PDB-style subdirectories according&lt;br /&gt;
to their PDB ID's. Depending on the traffic, a complete download will&lt;br /&gt;
take 2-4 days.&lt;br /&gt;
&lt;br /&gt;
==== How do I keep a local copy of the PDB up-to-date? ====&lt;br /&gt;
&lt;br /&gt;
This can also be done using the &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object. One simply&lt;br /&gt;
creates a &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object (specifying the directory where&lt;br /&gt;
the local copy of the PDB is present) and calls the &amp;lt;code&amp;gt;update_pdb&amp;lt;/code&amp;gt;&lt;br /&gt;
method:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
pl = PDBList(pdb='/data/pdb')&lt;br /&gt;
pl.update_pdb()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
One can of course make a weekly cronjob out of this to keep&lt;br /&gt;
the local copy automatically up-to-date. The PDB ftp site can also&lt;br /&gt;
be specified (see API documentation).&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; has some additional methods that can be of use. The&lt;br /&gt;
&amp;lt;code&amp;gt;get_all_obsolete&amp;lt;/code&amp;gt; method can be used to get a list of all&lt;br /&gt;
obsolete PDB entries. The &amp;lt;code&amp;gt;changed_this_week&amp;lt;/code&amp;gt; method can&lt;br /&gt;
be used to obtain the entries that were added, modified or obsoleted&lt;br /&gt;
during the current week. For more info on the possibilities of &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt;,&lt;br /&gt;
see the API documentation.&lt;br /&gt;
&lt;br /&gt;
==== What about all those buggy PDB files? ====&lt;br /&gt;
&lt;br /&gt;
It is well known that many PDB files contain semantic errors (I'm&lt;br /&gt;
not talking about the structures themselves know, but their representation&lt;br /&gt;
in PDB files). Bio.PDB tries to handle this in two ways. The PDBParser&lt;br /&gt;
object can behave in two ways: a restrictive way and a permissive&lt;br /&gt;
way (THIS IS NOW THE DEFAULT). The restrictive way used to be the&lt;br /&gt;
default, but people seemed to think that Bio.PDB 'crashed' due to&lt;br /&gt;
a bug (hah!), so I changed it. If you ever encounter a real bug, please&lt;br /&gt;
tell me immediately!&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Permissive parser&lt;br /&gt;
parser = PDBParser(PERMISSIVE=1)&lt;br /&gt;
parser = PDBParser()  # The same (default)&lt;br /&gt;
# Strict parser&lt;br /&gt;
strict_parser = PDBParser(PERMISSIVE=0)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
In the permissive state (DEFAULT), PDB files that obviously contain&lt;br /&gt;
errors are 'corrected' (i.e. some residues or atoms are left out).&lt;br /&gt;
These errors include:&lt;br /&gt;
* Multiple residues with the same identifier&lt;br /&gt;
* Multiple atoms with the same identifier (taking into account the altloc identifier)&lt;br /&gt;
These errors indicate real problems in the PDB file (for details see&lt;br /&gt;
the Bioinformatics article). In the restrictive state, PDB files with&lt;br /&gt;
errors cause an exception to occur. This is useful to find errors&lt;br /&gt;
in PDB files.&lt;br /&gt;
&lt;br /&gt;
Some errors however are automatically corrected. Normally each disordered&lt;br /&gt;
atom should have a non-blank altloc identifier. However, there are&lt;br /&gt;
many structures that do not follow this convention, and have a blank&lt;br /&gt;
and a non-blank identifier for two disordered positions of the same&lt;br /&gt;
atom. This is automatically interpreted in the right way.&lt;br /&gt;
&lt;br /&gt;
Sometimes a structure contains a list of residues belonging to chain&lt;br /&gt;
A, followed by residues belonging to chain B, and again followed by&lt;br /&gt;
residues belonging to chain A, i.e. the chains are 'broken'. This&lt;br /&gt;
is also correctly interpreted.&lt;br /&gt;
&lt;br /&gt;
==== Can I write PDB files? ====&lt;br /&gt;
&lt;br /&gt;
Use the PDBIO class for this. It's easy to write out specific parts&lt;br /&gt;
of a structure too, of course.&lt;br /&gt;
&lt;br /&gt;
Example: saving a structure&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
io = PDBIO()&lt;br /&gt;
io.set_structure(s)&lt;br /&gt;
io.save('out.pdb')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
If you want to write out a part of the structure, make use of the&lt;br /&gt;
&amp;lt;code&amp;gt;Select&amp;lt;/code&amp;gt; class (also in &amp;lt;code&amp;gt;PDBIO&amp;lt;/code&amp;gt;). Select has four methods:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
accept_model(model)&lt;br /&gt;
accept_chain(chain)&lt;br /&gt;
accept_residue(residue)&lt;br /&gt;
accept_atom(atom)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
By default, every method returns 1 (which means the model/chain/residue/atom&lt;br /&gt;
is included in the output). By subclassing &amp;lt;code&amp;gt;Select&amp;lt;/code&amp;gt; and returning&lt;br /&gt;
0 when appropriate you can exclude models, chains, etc. from the output.&lt;br /&gt;
Cumbersome maybe, but very powerful. The following code only writes&lt;br /&gt;
out glycine residues:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
class GlySelect(Select):&lt;br /&gt;
    def accept_residue(self, residue):&lt;br /&gt;
        if residue.get_name()=='GLY':&lt;br /&gt;
            return 1&lt;br /&gt;
        else:&lt;br /&gt;
            return 0&lt;br /&gt;
&lt;br /&gt;
io = PDBIO()&lt;br /&gt;
io.set_structure(s)&lt;br /&gt;
io.save('gly_only.pdb', GlySelect())&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
If this is all too complicated for you, the &amp;lt;code&amp;gt;Dice&amp;lt;/code&amp;gt; module contains&lt;br /&gt;
a handy &amp;lt;code&amp;gt;extract&amp;lt;/code&amp;gt; function that writes out all residues in&lt;br /&gt;
a chain between a start and end residue.&lt;br /&gt;
&lt;br /&gt;
==== Can I write mmCIF files? ====&lt;br /&gt;
&lt;br /&gt;
No, and I also don't have plans to add that functionality soon (or&lt;br /&gt;
ever - I don't need it at all, and it's a lot of work, plus no-one&lt;br /&gt;
has ever asked for it). People who want to add this can contact me.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== The Structure object ===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==== What's the overall layout of a Structure object? ====&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object follows the so-called '''SMCRA'''&lt;br /&gt;
(Structure/Model/Chain/Residue/Atom) architecture : &lt;br /&gt;
* A structure consists of models&lt;br /&gt;
* A model consists of chains&lt;br /&gt;
* A chain consists of residues&lt;br /&gt;
* A residue consists of atoms&lt;br /&gt;
This is the way many structural biologists/bioinformaticians think&lt;br /&gt;
about structure, and provides a simple but efficient way to deal with&lt;br /&gt;
structure. Additional stuff is essentially added when needed. A UML&lt;br /&gt;
diagram of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object (forgetting about the &amp;lt;code&amp;gt;Disordered&amp;lt;/code&amp;gt;&lt;br /&gt;
classes for now) is shown in the figure below.&lt;br /&gt;
&lt;br /&gt;
[[Image:Smcra.png|600px|left|frame|Diagram of SMCRA architecture of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object. Full lines with diamonds denote aggregation, full lines with arrows denote referencing, full lines with triangles denote inheritance and dashed lines with triangles denote interface realization.]]&lt;br /&gt;
&lt;br /&gt;
==== How do I navigate through a Structure object? ====&lt;br /&gt;
&lt;br /&gt;
The following code iterates through all atoms of a structure:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
p = PDBParser()&lt;br /&gt;
structure = p.get_structure('X', 'pdb1fat.ent')&lt;br /&gt;
for model in structure:&lt;br /&gt;
    for chain in model:&lt;br /&gt;
        for residue in chain:&lt;br /&gt;
            for atom in residue:&lt;br /&gt;
                print atom&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
There are also some shortcuts:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Iterate over all atoms in a structure&lt;br /&gt;
for atom in structure.get_atoms():&lt;br /&gt;
    print atom&lt;br /&gt;
&lt;br /&gt;
# Iterate over all residues in a model&lt;br /&gt;
for residue in model.get_residues():&lt;br /&gt;
    print residue&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Structures, models, chains, residues and atoms are called '''Entities'''&lt;br /&gt;
in Biopython. You can always get a parent &amp;lt;code&amp;gt;Entity&amp;lt;/code&amp;gt; from a child&lt;br /&gt;
&amp;lt;code&amp;gt;Entity&amp;lt;/code&amp;gt;, e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
residue = atom.get_parent()&lt;br /&gt;
chain = residue.get_parent()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
You can also test whether an &amp;lt;code&amp;gt;Entity&amp;lt;/code&amp;gt; has a certain child using&lt;br /&gt;
the &amp;lt;code&amp;gt;has_id&amp;lt;/code&amp;gt; method.&lt;br /&gt;
&lt;br /&gt;
==== Can I do that a bit more conveniently? ====&lt;br /&gt;
&lt;br /&gt;
You can do things like:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
atoms = structure.get_atoms()&lt;br /&gt;
residues = structure.get_residues()&lt;br /&gt;
atoms = chain.get_atoms()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
You can also use the &amp;lt;code&amp;gt;Selection.unfold_entities&amp;lt;/code&amp;gt; function:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Get all residues from a structure&lt;br /&gt;
res_list = Selection.unfold_entities(structure, 'R')&lt;br /&gt;
# Get all atoms from a chain&lt;br /&gt;
atom_list = Selection.unfold_entities(chain, 'A')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Obviously, &amp;lt;code&amp;gt;A&amp;lt;/code&amp;gt;=atom, &amp;lt;code&amp;gt;R&amp;lt;/code&amp;gt;=residue, &amp;lt;code&amp;gt;C&amp;lt;/code&amp;gt;=chain, &amp;lt;code&amp;gt;M&amp;lt;/code&amp;gt;=model, &amp;lt;code&amp;gt;S&amp;lt;/code&amp;gt;=structure.&lt;br /&gt;
You can use this to go up in the hierarchy, e.g. to get a list of (unique) &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;Chain&amp;lt;/code&amp;gt; parents from a list of&lt;br /&gt;
&amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;s:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
residue_list = Selection.unfold_entities(atom_list, 'R')&lt;br /&gt;
chain_list = Selection.unfold_entities(atom_list, 'C')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
For more info, see the API documentation.&lt;br /&gt;
&lt;br /&gt;
==== How do I extract a specific &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;Chain&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;Model&amp;lt;/code&amp;gt; from a &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt;? ====&lt;br /&gt;
&lt;br /&gt;
Easy. Here are some examples:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
model = structure[0]&lt;br /&gt;
chain = model['A']&lt;br /&gt;
residue = chain[100]&lt;br /&gt;
atom = residue['CA']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Note that you can use a shortcut:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
atom = structure[0]['A'][100]['CA']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== What is a model id? ====&lt;br /&gt;
&lt;br /&gt;
The model id is an integer which denotes the rank of the model in&lt;br /&gt;
the PDB/mmCIF file. The model is starts at 0. Crystal structures generally&lt;br /&gt;
have only one model (with id 0), while NMR files usually have several&lt;br /&gt;
models.&lt;br /&gt;
&lt;br /&gt;
==== What is a chain id? ====&lt;br /&gt;
&lt;br /&gt;
The chain id is specified in the PDB/mmCIF file, and is a single character&lt;br /&gt;
(typically a letter). &lt;br /&gt;
&lt;br /&gt;
==== What is a residue id? ====&lt;br /&gt;
&lt;br /&gt;
This is a bit more complicated, due to the clumsy PDB format. A residue&lt;br /&gt;
id is a tuple with three elements:&lt;br /&gt;
* The '''hetero-flag''': this is &amp;lt;code&amp;gt;'H_'&amp;lt;/code&amp;gt; plus the name of the hetero-residue (e.g. &amp;lt;code&amp;gt;'H_GLC'&amp;lt;/code&amp;gt; in the case of a glucose molecule), or &amp;lt;code&amp;gt;'W'&amp;lt;/code&amp;gt; in the case of a water molecule.&lt;br /&gt;
* The '''sequence identifier''' in the chain, e.g. 100&lt;br /&gt;
* The '''insertion code''', e.g. &amp;lt;code&amp;gt;'A'&amp;lt;/code&amp;gt;. The insertion code is sometimes used to preserve a certain desirable residue numbering scheme. A Ser 80 insertion mutant (inserted e.g. between a Thr 80 and an Asn 81 residue) could e.g. have sequence identifiers and insertion codes as follows: Thr 80 A, Ser 80 B, Asn 81. In this way the residue numbering scheme stays in tune with that of the wild type structure.&lt;br /&gt;
&lt;br /&gt;
The id of the above glucose residue would thus be &amp;lt;code&amp;gt;('H_GLC', 100, 'A')&amp;lt;/code&amp;gt;. If the hetero-flag and insertion code are blank, the sequence&lt;br /&gt;
identifier alone can be used:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Full id&lt;br /&gt;
residue = chain[(' ', 100, ' ')]&lt;br /&gt;
# Shortcut id&lt;br /&gt;
residue = chain[100]&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
The reason for the hetero-flag is that many, many PDB files use the same sequence identifier for an amino acid and a hetero-residue or a water, which would create obvious problems if the hetero-flag was not used.&lt;br /&gt;
&lt;br /&gt;
==== What is an atom id? ====&lt;br /&gt;
&lt;br /&gt;
The atom id is simply the atom name (eg. &amp;lt;code&amp;gt;'CA'&amp;lt;/code&amp;gt;). In practice,&lt;br /&gt;
the atom name is created by stripping all spaces from the atom name&lt;br /&gt;
in the PDB file. &lt;br /&gt;
&lt;br /&gt;
However, in PDB files, a space can be part of an atom name. Often,&lt;br /&gt;
calcium atoms are called &amp;lt;code&amp;gt;'CA..'&amp;lt;/code&amp;gt; in order to distinguish them&lt;br /&gt;
from C&amp;amp;alpha; atoms (which are called &amp;lt;code&amp;gt;'.CA.'&amp;lt;/code&amp;gt;). In cases&lt;br /&gt;
were stripping the spaces would create problems (ie. two atoms called&lt;br /&gt;
&amp;lt;code&amp;gt;'CA'&amp;lt;/code&amp;gt; in the same residue) the spaces are kept.&lt;br /&gt;
&lt;br /&gt;
==== How is disorder handled? ====&lt;br /&gt;
&lt;br /&gt;
This is one of the strong points of Bio.PDB. It can handle both disordered&lt;br /&gt;
atoms and point mutations (ie. a Gly and an Ala residue in the same&lt;br /&gt;
position). &lt;br /&gt;
&lt;br /&gt;
Disorder should be dealt with from two points of view: the atom and&lt;br /&gt;
the residue points of view. In general, I have tried to encapsulate&lt;br /&gt;
all the complexity that arises from disorder. If you just want to&lt;br /&gt;
loop over all C&amp;amp;alpha; atoms, you do not care that some residues&lt;br /&gt;
have a disordered side chain. On the other hand it should also be&lt;br /&gt;
possible to represent disorder completely in the data structure. Therefore,&lt;br /&gt;
disordered atoms or residues are stored in special objects that behave&lt;br /&gt;
as if there is no disorder. This is done by only representing a subset&lt;br /&gt;
of the disordered atoms or residues. Which subset is picked (e.g.&lt;br /&gt;
which of the two disordered OG side chain atom positions of a Ser&lt;br /&gt;
residue is used) can be specified by the user.&lt;br /&gt;
&lt;br /&gt;
'''Disordered atom positions''' are represented by ordinary &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;&lt;br /&gt;
objects, but all &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects that represent the same physical&lt;br /&gt;
atom are stored in a &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; object (see Fig. \ref{cap:SMCRA).&lt;br /&gt;
Each &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object in a &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; object can&lt;br /&gt;
be uniquely indexed using its altloc specifier. The &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt;&lt;br /&gt;
object forwards all uncaught method calls to the selected Atom object,&lt;br /&gt;
by default the one that represents the atom with the highest&lt;br /&gt;
occupancy. The user can of course change the selected &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;&lt;br /&gt;
object, making use of its altloc specifier. In this way atom disorder&lt;br /&gt;
is represented correctly without much additional complexity. In other&lt;br /&gt;
words, if you are not interested in atom disorder, you will not be&lt;br /&gt;
bothered by it.&lt;br /&gt;
&lt;br /&gt;
Each disordered atom has a characteristic altloc identifier. You can&lt;br /&gt;
specify that a &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; object should behave like&lt;br /&gt;
the &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object associated with a specific altloc identifier:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
atom.disordered_select('A')  # select altloc A atom&lt;br /&gt;
atom.disordered_select('B')  # select altloc B atom &lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
A special case arises when disorder is due to '''point mutations''',&lt;br /&gt;
i.e. when two or more point mutants of a polypeptide are present in&lt;br /&gt;
the crystal. An example of this can be found in PDB structure 1EN2.&lt;br /&gt;
&lt;br /&gt;
Since these residues belong to a different residue type (e.g. let's&lt;br /&gt;
say Ser 60 and Cys 60) they should not be stored in a single &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt;&lt;br /&gt;
object as in the common case. In this case, each residue is represented&lt;br /&gt;
by one &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object, and both &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; objects&lt;br /&gt;
are stored in a single &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt; object (see Fig.&lt;br /&gt;
\ref{cap:SMCRA).&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt; object forwards all uncaught&lt;br /&gt;
methods to the selected &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object (by default the last&lt;br /&gt;
&amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object added), and thus behaves like an ordinary&lt;br /&gt;
residue. Each &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object in a &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt;&lt;br /&gt;
object can be uniquely identified by its residue name. In the above&lt;br /&gt;
example, residue Ser 60 would have id 'SER' in the &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt;&lt;br /&gt;
object, while residue Cys 60 would have id 'CYS'. The user can select&lt;br /&gt;
the active &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object in a &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt;&lt;br /&gt;
object via this id.&lt;br /&gt;
&lt;br /&gt;
Example: suppose that a chain has a point mutation at position 10,&lt;br /&gt;
consisting of a Ser and a Cys residue. Make sure that residue 10 of&lt;br /&gt;
this chain behaves as the Cys residue.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
residue = chain[10]&lt;br /&gt;
residue.disordered_select('CYS')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
In addition, you can get a list of all &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects (ie.&lt;br /&gt;
all &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; objects are 'unpacked' to their individual&lt;br /&gt;
&amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects) using the &amp;lt;code&amp;gt;get_unpacked_list&amp;lt;/code&amp;gt; method&lt;br /&gt;
of a (&amp;lt;code&amp;gt;Disordered&amp;lt;/code&amp;gt;)&amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object.&lt;br /&gt;
&lt;br /&gt;
==== Can I sort residues in a chain somehow? ====&lt;br /&gt;
&lt;br /&gt;
Yes, kinda, but I'm waiting for a request for this feature to finish&lt;br /&gt;
it :-).&lt;br /&gt;
&lt;br /&gt;
==== How are ligands and solvent handled? ====&lt;br /&gt;
&lt;br /&gt;
See 'What is a residue id?'.&lt;br /&gt;
&lt;br /&gt;
==== What about B factors? ====&lt;br /&gt;
&lt;br /&gt;
Well, yes! Bio.PDB supports isotropic and anisotropic B factors, and&lt;br /&gt;
also deals with standard deviations of anisotropic B factor if present&lt;br /&gt;
(see the section [[#Analysis]]).&lt;br /&gt;
&lt;br /&gt;
==== What about standard deviation of atomic positions? ====&lt;br /&gt;
&lt;br /&gt;
Yup, supported. See the section [[#Analysis]].&lt;br /&gt;
&lt;br /&gt;
==== I think the SMCRA data structure is not flexible/sexy/whatever enough... ====&lt;br /&gt;
&lt;br /&gt;
Sure, sure. Everybody is always coming up with (mostly vaporware or&lt;br /&gt;
partly implemented) data structures that handle all possible situations&lt;br /&gt;
and are extensible in all thinkable (and unthinkable) ways. The prosaic&lt;br /&gt;
truth however is that 99.9\% of people using (and I mean really using!)&lt;br /&gt;
crystal structures think in terms of models, chains, residues and&lt;br /&gt;
atoms. The philosophy of Bio.PDB is to provide a reasonably fast,&lt;br /&gt;
clean, simple, but complete data structure to access structure data.&lt;br /&gt;
The proof of the pudding is in the eating.&lt;br /&gt;
&lt;br /&gt;
Moreover, it is quite easy to build more specialised data structures&lt;br /&gt;
on top of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; class (eg. there's a &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt;&lt;br /&gt;
class). On the other hand, the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object is built&lt;br /&gt;
using a Parser/Consumer approach (called &amp;lt;code&amp;gt;PDBParser&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;MMCIFParser&amp;lt;/code&amp;gt;&lt;br /&gt;
and &amp;lt;code&amp;gt;StructureBuilder&amp;lt;/code&amp;gt;, respectively). One can easily reuse&lt;br /&gt;
the PDB/mmCIF parsers by implementing a specialised &amp;lt;code&amp;gt;StructureBuilder&amp;lt;/code&amp;gt;&lt;br /&gt;
class. It is of course also trivial to add support for new file formats&lt;br /&gt;
by writing new parsers.&lt;br /&gt;
&lt;br /&gt;
=== Analysis ===&lt;br /&gt;
&lt;br /&gt;
==== How do I extract information from an &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object? ====&lt;br /&gt;
&lt;br /&gt;
Using the following methods:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
a.get_name()           # atom name (spaces stripped, e.g. 'CA')&lt;br /&gt;
a.get_id()             # id (equals atom name)&lt;br /&gt;
a.get_coord()          # atomic coordinates&lt;br /&gt;
a.get_vector()         # atomic coordinates as Vector object&lt;br /&gt;
a.get_bfactor()        # isotropic B factor&lt;br /&gt;
a.get_occupancy()      # occupancy&lt;br /&gt;
a.get_altloc()         # alternative location specifier&lt;br /&gt;
a.get_sigatm()         # std. dev. of atomic parameters&lt;br /&gt;
a.get_siguij()         # std. dev. of anisotropic B factor&lt;br /&gt;
a.get_anisou()         # anisotropic B factor&lt;br /&gt;
a.get_fullname()       # atom name (with spaces, e.g. '.CA.')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I extract information from a &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object? ====&lt;br /&gt;
&lt;br /&gt;
Using the following methods:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
r.get_resname()         # return the residue name (eg. 'GLY')&lt;br /&gt;
r.is_disordered()       # 1 if the residue has disordered atoms&lt;br /&gt;
r.get_segid()           # return the SEGID&lt;br /&gt;
r.has_id(name)          # test if a residue has a certain atom&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I measure distances? ====&lt;br /&gt;
&lt;br /&gt;
That's simple: the minus operator for atoms has been overloaded to&lt;br /&gt;
return the distance between two atoms. &lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Get some atoms&lt;br /&gt;
ca1 = residue1['CA']&lt;br /&gt;
ca2 = residue2['CA']&lt;br /&gt;
# Simply subtract the atoms to get their distance&lt;br /&gt;
distance = ca1-ca2&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I measure angles? ====&lt;br /&gt;
&lt;br /&gt;
This can easily be done via the vector representation of the atomic coordinates, and the &amp;lt;code&amp;gt;calc_angle&amp;lt;/code&amp;gt; function from the &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; module:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
vector1 = atom1.get_vector()&lt;br /&gt;
vector2 = atom2.get_vector()&lt;br /&gt;
vector3 = atom3.get_vector()&lt;br /&gt;
angle = calc_angle(vector1, vector2, vector3)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I measure torsion angles? ====&lt;br /&gt;
&lt;br /&gt;
Again, this can easily be done via the vector representation of the&lt;br /&gt;
atomic coordinates, this time using the &amp;lt;code&amp;gt;calc_dihedral&amp;lt;/code&amp;gt; function&lt;br /&gt;
from the &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; module:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
vector1 = atom1.get_vector()&lt;br /&gt;
vector2 = atom2.get_vector()&lt;br /&gt;
vector3 = atom3.get_vector()&lt;br /&gt;
vector4 = atom4.get_vector()&lt;br /&gt;
angle = calc_dihedral(vector1, vector2, vector3, vector4)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I determine atom-atom contacts? ====&lt;br /&gt;
&lt;br /&gt;
Use &amp;lt;code&amp;gt;NeighborSearch&amp;lt;/code&amp;gt;. This uses a KD tree data structure coded&lt;br /&gt;
in C behind the screens, so it's pretty darn fast (see &amp;lt;code&amp;gt;Bio.KDTree&amp;lt;/code&amp;gt;).&lt;br /&gt;
&lt;br /&gt;
==== How do I extract polypeptides from a &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object? ====&lt;br /&gt;
&lt;br /&gt;
Use &amp;lt;code&amp;gt;PolypeptideBuilder&amp;lt;/code&amp;gt;. You can use the resulting &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt;&lt;br /&gt;
object to get the sequence as a &amp;lt;code&amp;gt;Seq&amp;lt;/code&amp;gt; object or to get a list&lt;br /&gt;
of C&amp;amp;alpha; atoms as well. Polypeptides can be built using a C-N&lt;br /&gt;
or a C&amp;amp;alpha;-C&amp;amp;alpha; distance criterion.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Using C-N &lt;br /&gt;
ppb=PPBuilder()&lt;br /&gt;
for pp in ppb.build_peptides(structure): &lt;br /&gt;
    print pp.get_sequence()&lt;br /&gt;
&lt;br /&gt;
# Using CA-CA&lt;br /&gt;
ppb=CaPPBuilder()&lt;br /&gt;
for pp in ppb.build_peptides(structure): &lt;br /&gt;
    print pp.get_sequence()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that in the above case only model 0 of the structure is considered&lt;br /&gt;
by &amp;lt;code&amp;gt;PolypeptideBuilder&amp;lt;/code&amp;gt;. However, it is possible to use &amp;lt;code&amp;gt;PolypeptideBuilder&amp;lt;/code&amp;gt;&lt;br /&gt;
to build &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt; objects from &amp;lt;code&amp;gt;Model&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;Chain&amp;lt;/code&amp;gt;&lt;br /&gt;
objects as well.&lt;br /&gt;
&lt;br /&gt;
==== How do I get the sequence of a structure? ====&lt;br /&gt;
&lt;br /&gt;
The first thing to do is to extract all polypeptides from the structure&lt;br /&gt;
(see previous entry). The sequence of each polypeptide can then easily&lt;br /&gt;
be obtained from the &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt; objects. The sequence is&lt;br /&gt;
represented as a Biopython &amp;lt;code&amp;gt;Seq&amp;lt;/code&amp;gt; object, and its alphabet is&lt;br /&gt;
defined by a &amp;lt;code&amp;gt;ProteinAlphabet&amp;lt;/code&amp;gt; object.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; seq = polypeptide.get_sequence()&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print seq&lt;br /&gt;
Seq('SNVVE...', &amp;lt;class Bio.Alphabet.ProteinAlphabet&amp;gt;)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I determine secondary structure? ====&lt;br /&gt;
&lt;br /&gt;
For this functionality, you need to install DSSP (and obtain a license&lt;br /&gt;
for it - free for academic use, see http://www.cmbi.kun.nl/gv/dssp/).&lt;br /&gt;
Then use the &amp;lt;code&amp;gt;DSSP&amp;lt;/code&amp;gt; class, which maps &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; objects&lt;br /&gt;
to their secondary structure (and accessible surface area). The DSSP&lt;br /&gt;
codes are listed in Table \ref{cap:DSSP-codes. Note that DSSP (the&lt;br /&gt;
program, and thus by consequence the class) cannot handle multiple&lt;br /&gt;
models!&lt;br /&gt;
&lt;br /&gt;
%&lt;br /&gt;
\begin{table&lt;br /&gt;
&lt;br /&gt;
==== \begin{tabular{|c|c|&lt;br /&gt;
\hline &lt;br /&gt;
Code&amp;amp;&lt;br /&gt;
Secondary structure\tabularnewline&lt;br /&gt;
\hline&lt;br /&gt;
\hline &lt;br /&gt;
H&amp;amp;&lt;br /&gt;
&amp;amp;alpha;-helix\tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
B&amp;amp;&lt;br /&gt;
Isolated &amp;amp;beta;-bridge residue\tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
E&amp;amp;&lt;br /&gt;
Strand \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
G&amp;amp;&lt;br /&gt;
3-10 helix \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
I&amp;amp;&lt;br /&gt;
$\Pi$-helix \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
T&amp;amp;&lt;br /&gt;
Turn\tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
S&amp;amp;&lt;br /&gt;
Bend \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
-&amp;amp;&lt;br /&gt;
Other\tabularnewline&lt;br /&gt;
\hline&lt;br /&gt;
\end{tabular&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
\caption{\label{cap:DSSP-codesDSSP codes in Bio.PDB.&lt;br /&gt;
\end{table&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==== How do I calculate the accessible surface area of a residue? ====&lt;br /&gt;
&lt;br /&gt;
Use the &amp;lt;code&amp;gt;DSSP&amp;lt;/code&amp;gt; class (see also previous entry). But see also&lt;br /&gt;
next entry.&lt;br /&gt;
&lt;br /&gt;
==== How do I calculate residue depth? ====&lt;br /&gt;
&lt;br /&gt;
Residue depth is the average distance of a residue's atoms from the&lt;br /&gt;
solvent accessible surface. It's a fairly new and very powerful parameterization&lt;br /&gt;
of solvent accessibility. For this functionality, you need to install&lt;br /&gt;
Michel Sanner's MSMS program (http://www.scripps.edu/pub/olson-web/people/sanner/html/msms_home.html).&lt;br /&gt;
Then use the &amp;lt;code&amp;gt;ResidueDepth&amp;lt;/code&amp;gt; class. This class behaves as a&lt;br /&gt;
dictionary which maps &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; objects to corresponding (residue&lt;br /&gt;
depth, C&amp;amp;alpha; depth) tuples. The C&amp;amp;alpha; depth is the distance&lt;br /&gt;
of a residue's C&amp;amp;alpha; atom to the solvent accessible surface. &lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
model = structure[0]&lt;br /&gt;
rd = ResidueDepth(model, pdb_file)&lt;br /&gt;
residue_depth, ca_depth = rd[some_residue]&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
You can also get access to the molecular surface itself (via the &amp;lt;code&amp;gt;get_surface&amp;lt;/code&amp;gt;&lt;br /&gt;
function), in the form of a Numeric python array with the surface points.&lt;br /&gt;
&lt;br /&gt;
==== How do I calculate Half Sphere Exposure? ====&lt;br /&gt;
&lt;br /&gt;
Half Sphere Exposure (HSE) is a new, 2D measure of solvent exposure.&lt;br /&gt;
Basically, it counts the number of C&amp;amp;alpha; atoms around a residue&lt;br /&gt;
in the direction of its side chain, and in the opposite direction&lt;br /&gt;
(within a radius of 13 Å). Despite its simplicity, it outperforms&lt;br /&gt;
many other measures of solvent exposure. An article describing this&lt;br /&gt;
novel 2D measure has been submitted.&lt;br /&gt;
&lt;br /&gt;
HSE comes in two flavors: HSE&amp;amp;alpha; and HSE&amp;amp;beta;. The former&lt;br /&gt;
only uses the C&amp;amp;alpha; atom positions, while the latter uses the&lt;br /&gt;
C&amp;amp;alpha; and C&amp;amp;beta; atom positions. The HSE measure is calculated&lt;br /&gt;
by the &amp;lt;code&amp;gt;HSExposure&amp;lt;/code&amp;gt; class, which can also calculate the contact&lt;br /&gt;
number. The latter class has methods which return dictionaries that&lt;br /&gt;
map a &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object to its corresponding HSE&amp;amp;alpha;, HSE&amp;amp;beta;&lt;br /&gt;
and contact number values.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
model = structure[0]&lt;br /&gt;
hse = HSExposure()&lt;br /&gt;
# Calculate HSEalpha&lt;br /&gt;
exp_ca = hse.calc_hs_exposure(model, option='CA3')&lt;br /&gt;
# Calculate HSEbeta&lt;br /&gt;
exp_cb = hse.calc_hs_exposure(model, option='CB')&lt;br /&gt;
# Calculate classical coordination number&lt;br /&gt;
exp_fs = hse.calc_fs_exposure(model)&lt;br /&gt;
# Print HSEalpha for a residue&lt;br /&gt;
print exp_ca[some_residue]&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I map the residues of two related structures onto each other? ====&lt;br /&gt;
&lt;br /&gt;
First, create an alignment file in FASTA format, then use the &amp;lt;code&amp;gt;StructureAlignment&amp;lt;/code&amp;gt;&lt;br /&gt;
class. This class can also be used for alignments with more than two&lt;br /&gt;
structures.&lt;br /&gt;
&lt;br /&gt;
==== How do I test if a Residue object is an amino acid? ====&lt;br /&gt;
&lt;br /&gt;
Use &amp;lt;code&amp;gt;is_aa(residue)&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
==== Can I do vector operations on atomic coordinates? ====&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects return a &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; object representation&lt;br /&gt;
of the coordinates with the &amp;lt;code&amp;gt;get_vector&amp;lt;/code&amp;gt; method. &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt;&lt;br /&gt;
implements the full set of 3D vector operations, matrix multiplication&lt;br /&gt;
(left and right) and some advanced rotation-related operations as&lt;br /&gt;
well. See also next question.&lt;br /&gt;
&lt;br /&gt;
==== How do I put a virtual C&amp;amp;beta; on a Gly residue? ====&lt;br /&gt;
&lt;br /&gt;
OK, I admit, this example is only present to show off the possibilities&lt;br /&gt;
of Bio.PDB's &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; module (though this code is actually&lt;br /&gt;
used in the &amp;lt;code&amp;gt;HSExposure&amp;lt;/code&amp;gt; module, which contains a novel way&lt;br /&gt;
to parametrize residue exposure - publication underway). Suppose that&lt;br /&gt;
you would like to find the position of a Gly residue's C&amp;amp;beta; atom,&lt;br /&gt;
if it had one. How would you do that? Well, rotating the N atom of&lt;br /&gt;
the Gly residue along the C&amp;amp;alpha;-C bond over -120 degrees roughly&lt;br /&gt;
puts it in the position of a virtual C&amp;amp;beta; atom. Here's how to&lt;br /&gt;
do it, making use of the &amp;lt;code&amp;gt;rotaxis&amp;lt;/code&amp;gt; method (which can be used&lt;br /&gt;
to construct a rotation around a certain axis) of the &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt;&lt;br /&gt;
module:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# get atom coordinates as vectors&lt;br /&gt;
n = residue['N'].get_vector() &lt;br /&gt;
c = residue['C'].get_vector() &lt;br /&gt;
ca = residue['CA'].get_vector()&lt;br /&gt;
# center at origin&lt;br /&gt;
n = n - ca&lt;br /&gt;
c = c - ca&lt;br /&gt;
# find rotation matrix that rotates n -120 degrees along the ca-c vector&lt;br /&gt;
rot = rotaxis(-pi{*120.0/180.0, c)&lt;br /&gt;
# apply rotation to ca-n vector&lt;br /&gt;
cb_at_origin = n.left_multiply(rot)&lt;br /&gt;
# put on top of ca atom&lt;br /&gt;
cb = cb_at_origin + ca&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
This example shows that it's possible to do some quite nontrivial&lt;br /&gt;
vector operations on atomic data, which can be quite useful. In addition&lt;br /&gt;
to all the usual vector operations (cross (use &amp;lt;code&amp;gt;**&amp;lt;/code&amp;gt;), and&lt;br /&gt;
dot (use &amp;lt;code&amp;gt;*&amp;lt;/code&amp;gt;) product, angle, norm, etc.) and the above mentioned&lt;br /&gt;
&amp;lt;code&amp;gt;rotaxis function, the &amp;lt;code&amp;gt;Vector module also has methods&lt;br /&gt;
to rotate (&amp;lt;code&amp;gt;rotmat&amp;lt;/code&amp;gt;) or reflect (&amp;lt;code&amp;gt;refmat&amp;lt;/code&amp;gt;) one vector&lt;br /&gt;
on top of another.&lt;br /&gt;
&lt;br /&gt;
=== Manipulating the structure ===&lt;br /&gt;
&lt;br /&gt;
==== How do I superimpose two structures? ====&lt;br /&gt;
&lt;br /&gt;
Surprisingly, this is done using the &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; object.&lt;br /&gt;
This object calculates the rotation and translation matrix that rotates&lt;br /&gt;
two lists of atoms on top of each other in such a way that their RMSD&lt;br /&gt;
is minimized. Of course, the two lists need to contain the same amount&lt;br /&gt;
of atoms. The &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; object can also apply the rotation/translation&lt;br /&gt;
to a list of atoms. The rotation and translation are stored as a tuple&lt;br /&gt;
in the &amp;lt;code&amp;gt;rotran&amp;lt;/code&amp;gt; attribute of the &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; object&lt;br /&gt;
(note that the rotation is right multiplying!). The RMSD is stored&lt;br /&gt;
in the &amp;lt;code&amp;gt;rmsd&amp;lt;/code&amp;gt; attribute.&lt;br /&gt;
&lt;br /&gt;
The algorithm used by &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; comes from \textit{Matrix&lt;br /&gt;
computations, 2nd ed. Golub, G. \&amp;amp; Van Loan (1989) and makes use&lt;br /&gt;
of singular value decomposition (this is implemented in the general&lt;br /&gt;
&amp;lt;code&amp;gt;Bio.SVDSuperimposer&amp;lt;/code&amp;gt; module).&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
sup = Superimposer()&lt;br /&gt;
# Specify the atom lists&lt;br /&gt;
# 'fixed' and 'moving' are lists of Atom objects&lt;br /&gt;
# The moving atoms will be put on the fixed atoms&lt;br /&gt;
sup.set_atoms(fixed, moving)&lt;br /&gt;
# Print rotation/translation/rmsd&lt;br /&gt;
print sup.rotran&lt;br /&gt;
print sup.rms &lt;br /&gt;
# Apply rotation/translation to the moving atoms&lt;br /&gt;
sup.apply(moving)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I superimpose two structures based on their active sites? ====&lt;br /&gt;
&lt;br /&gt;
Pretty easily. Use the active site atoms to calculate the rotation/translation&lt;br /&gt;
matrices (see above), and apply these to the whole molecule.&lt;br /&gt;
&lt;br /&gt;
==== Can I manipulate the atomic coordinates? ====&lt;br /&gt;
&lt;br /&gt;
Yes, using the &amp;lt;code&amp;gt;transform&amp;lt;/code&amp;gt; method of the &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object,&lt;br /&gt;
or directly using the &amp;lt;code&amp;gt;set_coord&amp;lt;/code&amp;gt; method.&lt;br /&gt;
&lt;br /&gt;
== Other Structural Bioinformatics modules ==&lt;br /&gt;
&lt;br /&gt;
==== Bio.SCOP ====&lt;br /&gt;
&lt;br /&gt;
See the main Biopython tutorial.&lt;br /&gt;
&lt;br /&gt;
==== Bio.FSSP ====&lt;br /&gt;
&lt;br /&gt;
No documentation available yet.&lt;br /&gt;
&lt;br /&gt;
== You haven't answered my question yet! ==&lt;br /&gt;
&lt;br /&gt;
Woah! It's late and I'm tired, and a glass of excellent ''Pedro Ximenez'' sherry is waiting for me. Just drop me a mail, and I'll answer&lt;br /&gt;
you in the morning (with a bit of luck...).&lt;br /&gt;
&lt;br /&gt;
== Contributors ==&lt;br /&gt;
&lt;br /&gt;
The main author/maintainer of Bio.PDB is:&lt;br /&gt;
&lt;br /&gt;
 Thomas Hamelryck&lt;br /&gt;
 Bioinformatics center&lt;br /&gt;
 Institute of Molecular Biology&lt;br /&gt;
 University of Copenhagen&lt;br /&gt;
 Universitetsparken 15, Bygning 10&lt;br /&gt;
 DK-2100 København Ø&lt;br /&gt;
 Denmark&lt;br /&gt;
 thamelry@binf.ku.dk&lt;br /&gt;
&lt;br /&gt;
Kristian Rother donated code to interact with the PDB database, and to parse the PDB&lt;br /&gt;
header. Indraneel Majumdar sent in some bug reports and assisted in&lt;br /&gt;
coding the &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt; module. Many thanks to Brad Chapman,&lt;br /&gt;
Jeffrey Chang, Andrew Dalke and Iddo Friedberg for suggestions, comments,&lt;br /&gt;
help and/or biting criticism :-).&lt;br /&gt;
&lt;br /&gt;
== Can I contribute? ==&lt;br /&gt;
&lt;br /&gt;
Yes, yes, yes! Just send an e-mail to [mailto:thamelry@binf.ku.dk Thomas Hamelryck] or to [mailto:biopython-dev@biopython.org the Biopython developers] if you have something useful to contribute! Eternal fame awaits!&lt;/div&gt;</summary>
		<author><name>Mdehoon</name></author>	</entry>

	<entry>
		<id>http://www.biopython.org/wiki/The_Biopython_Structural_Bioinformatics_FAQ</id>
		<title>The Biopython Structural Bioinformatics FAQ</title>
		<link rel="alternate" type="text/html" href="http://www.biopython.org/wiki/The_Biopython_Structural_Bioinformatics_FAQ"/>
				<updated>2013-03-26T07:59:00Z</updated>
		
		<summary type="html">&lt;p&gt;Mdehoon: /* What's the overall layout of a Structure object? */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Introduction ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB is a biopython module that focuses on working with crystal&lt;br /&gt;
structures of biological macromolecules. This document gives a fairly&lt;br /&gt;
complete overview of Bio.PDB.&lt;br /&gt;
&lt;br /&gt;
== Bio.PDB's installation ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB is automatically installed as part of Biopython. Biopython&lt;br /&gt;
can be obtained from http://www.biopython.org. It runs on many&lt;br /&gt;
platforms (Linux/Unix, windows, Mac,...).&lt;br /&gt;
&lt;br /&gt;
== Who's using Bio.PDB? ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB was used in the construction of DISEMBL, a web server that&lt;br /&gt;
predicts disordered regions in proteins (http://dis.embl.de/),&lt;br /&gt;
and COLUMBA, a website that provides annotated protein structures&lt;br /&gt;
(http://www.columba-db.de/). Bio.PDB has also been used to&lt;br /&gt;
perform a large scale search for active sites similarities between&lt;br /&gt;
protein structures in the PDB (see [http://dx.doi.org/10.1002/prot.10338 \textit{Proteins Struct. Func.&lt;br /&gt;
Gen., \textbf{2003, 51, 96-108]) , and to develop a new algorithm&lt;br /&gt;
that identifies linear secondary structure elements ([\emph{BMC Bioinformatics,&lt;br /&gt;
\textbf{2005, 6, 202 http://www.biomedcentral.com/1471-2105/6/202]).&lt;br /&gt;
&lt;br /&gt;
Judging from requests for features and information, Bio.PDB is also&lt;br /&gt;
used by several LPCs (Large Pharmaceutical Companies :-).&lt;br /&gt;
&lt;br /&gt;
== Is there a Bio.PDB reference? ==&lt;br /&gt;
&lt;br /&gt;
Yes, and I'd appreciate it if you would refer to Bio.PDB in publications&lt;br /&gt;
if you make use of it. The reference is:&lt;br /&gt;
&lt;br /&gt;
\begin{quote&lt;br /&gt;
Hamelryck, T., Manderick, B. (2003) PDB parser and structure class&lt;br /&gt;
implemented in Python. \textit{Bioinformatics, \textbf{19, 2308-2310. &lt;br /&gt;
\end{quote&lt;br /&gt;
The article can be freely downloaded via the Bioinformatics journal&lt;br /&gt;
website (http://www.binf.ku.dk/users/thamelry/references.html).&lt;br /&gt;
I welcome e-mails telling me what you are using Bio.PDB for. Feature&lt;br /&gt;
requests are welcome too.&lt;br /&gt;
&lt;br /&gt;
== How well tested is Bio.PDB? ==&lt;br /&gt;
&lt;br /&gt;
Pretty well, actually. Bio.PDB has been extensively tested on nearly&lt;br /&gt;
5500 structures from the PDB - all structures seemed to be parsed&lt;br /&gt;
correctly. More details can be found in the Bio.PDB Bioinformatics&lt;br /&gt;
article. Bio.PDB has been used/is being used in many research projects&lt;br /&gt;
as a reliable tool. In fact, I'm using Bio.PDB almost daily for research&lt;br /&gt;
purposes and continue working on improving it and adding new features.&lt;br /&gt;
&lt;br /&gt;
== How fast is it? ==&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;PDBParser&amp;lt;/code&amp;gt; performance was tested on about 800 structures&lt;br /&gt;
(each belonging to a unique SCOP superfamily). This takes about 20&lt;br /&gt;
minutes, or on average 1.5 seconds per structure. Parsing the structure&lt;br /&gt;
of the large ribosomal subunit (1FKK), which contains about 64000&lt;br /&gt;
atoms, takes 10 seconds on a 1000 MHz PC. In short: it's more than&lt;br /&gt;
fast enough for many applications.&lt;br /&gt;
&lt;br /&gt;
== Why should I use Bio.PDB? ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB might be exactly what you want, and then again it might not.&lt;br /&gt;
If you are interested in data mining the PDB header, you might want&lt;br /&gt;
to look elsewhere because there is only limited support for this.&lt;br /&gt;
If you look for a powerful, complete data structure to access the&lt;br /&gt;
atomic data Bio.PDB is probably for you. &lt;br /&gt;
&lt;br /&gt;
== Usage ==&lt;br /&gt;
&lt;br /&gt;
=== General questions ===&lt;br /&gt;
&lt;br /&gt;
==== Importing Bio.PDB ====&lt;br /&gt;
&lt;br /&gt;
That's simple:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
from Bio.PDB import *&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Is there support for molecular graphics? ====&lt;br /&gt;
&lt;br /&gt;
Not directly, mostly since there are quite a few Python based/Python&lt;br /&gt;
aware solutions already, that can potentially be used with Bio.PDB.&lt;br /&gt;
My choice is Pymol, BTW (I've used this successfully with Bio.PDB,&lt;br /&gt;
and there will probably be specific PyMol modules in Bio.PDB soon/some&lt;br /&gt;
day). Python based/aware molecular graphics solutions include:&lt;br /&gt;
&lt;br /&gt;
* PyMol: http://pymol.sourceforge.net/&lt;br /&gt;
* Chimera: http://www.cgl.ucsf.edu/chimera/&lt;br /&gt;
* PMV: http://www.scripps.edu/ sanner/python/&lt;br /&gt;
* Coot: http://www.ysbl.york.ac.uk/ emsley/coot/&lt;br /&gt;
* CCP4mg: http://www.ysbl.york.ac.uk/ lizp/molgraphics.html&lt;br /&gt;
* mmLib: http://pymmlib.sourceforge.net/ &lt;br /&gt;
* VMD: http://www.ks.uiuc.edu/Research/vmd/&lt;br /&gt;
* MMTK: http://starship.python.net/crew/hinsen/MMTK/&lt;br /&gt;
&lt;br /&gt;
I'd be crazy to write another molecular graphics application (been&lt;br /&gt;
there - done that, actually :-).&lt;br /&gt;
&lt;br /&gt;
=== Input/output ===&lt;br /&gt;
&lt;br /&gt;
==== How do I create a structure object from a PDB file? ====&lt;br /&gt;
&lt;br /&gt;
First, create a &amp;lt;code&amp;gt;PDBParser&amp;lt;/code&amp;gt; object:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
parser = PDBParser()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then, create a structure object from a PDB file in the following way&lt;br /&gt;
(the PDB file in this case is called '1FAT.pdb', 'PHA-L' is a user&lt;br /&gt;
defined name for the structure):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
structure = parser.get_structure('PHA-L', '1FAT.pdb')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I create a structure object from an mmCIF file? ====&lt;br /&gt;
&lt;br /&gt;
Similarly to the case the case of PDB files, first create an &amp;lt;code&amp;gt;MMCIFParser&amp;lt;/code&amp;gt;&lt;br /&gt;
object:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
parser = MMCIFParser()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Then use this parser to create a structure object from the mmCIF file:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
structure = parser.get_structure('PHA-L', '1FAT.cif')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== ...and what about the new PDB XML format? ====&lt;br /&gt;
&lt;br /&gt;
That's not yet supported, but I'm definitely planning to support that&lt;br /&gt;
in the future (it's not a lot of work). Contact me if you need this,&lt;br /&gt;
it might encourage me :-).&lt;br /&gt;
&lt;br /&gt;
==== I'd like to have some more low level access to an mmCIF file... ====&lt;br /&gt;
&lt;br /&gt;
You got it. You can create a python dictionary that maps all mmCIF&lt;br /&gt;
tags in an mmCIF file to their values. If there are multiple values&lt;br /&gt;
(like in the case of tag &amp;lt;code&amp;gt;_atom_site.Cartn_y&amp;lt;/code&amp;gt;, which holds&lt;br /&gt;
the ''y'' coordinates of all atoms), the tag is mapped to a list of values.&lt;br /&gt;
The dictionary is created from the mmCIF file as follows:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
mmcif_dict = MMCIF2Dict('1FAT.cif')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example: get the solvent content from an mmCIF file:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
sc = mmcif_dict['_exptl_crystal.density_percent_sol']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example: get the list of the ''y'' coordinates of all atoms&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
y_list = mmcif_dict['_atom_site.Cartn_y']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Can I access the header information? ====&lt;br /&gt;
&lt;br /&gt;
Thanks to Christian Rother you can access some information from the&lt;br /&gt;
PDB header. Note however that many PDB files contain headers with&lt;br /&gt;
incomplete or erroneous information. Many of the errors have been&lt;br /&gt;
fixed in the equivalent mmCIF files.&lt;br /&gt;
'''Hence, if you are interested in the header information, it is a good idea to extract information from mmCIF files using the &amp;lt;code&amp;gt;MMCIF2Dict&amp;lt;/code&amp;gt; tool described above, instead of parsing the PDB header.''' &lt;br /&gt;
&lt;br /&gt;
Now that is clarified, let's return to parsing the PDB header. The&lt;br /&gt;
structure object has an attribute called &amp;lt;code&amp;gt;header&amp;lt;/code&amp;gt; which is&lt;br /&gt;
a python dictionary that maps header records to their values.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
resolution = structure.header['resolution']&lt;br /&gt;
keywords = structure.header['keywords']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The available keys are &amp;lt;code&amp;gt;name&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;head&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;deposition_date&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;release_date&amp;lt;/code&amp;gt;,&lt;br /&gt;
&amp;lt;code&amp;gt;structure_method&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;resolution&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;structure_reference&amp;lt;/code&amp;gt; (maps to&lt;br /&gt;
a list of references), &amp;lt;code&amp;gt;journal_reference&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;author&amp;lt;/code&amp;gt; and&lt;br /&gt;
&amp;lt;code&amp;gt;compound&amp;lt;/code&amp;gt; (maps to a dictionary with various information about&lt;br /&gt;
the crystallized compound).&lt;br /&gt;
&lt;br /&gt;
The dictionary can also be created without creating a &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt;&lt;br /&gt;
object, ie. directly from the PDB file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
handle = open(filename,'r')&lt;br /&gt;
header_dict = parse_pdb_header(handle)&lt;br /&gt;
handle.close()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Can I use Bio.PDB with NMR structures (ie. with more than one model)? ====&lt;br /&gt;
&lt;br /&gt;
Sure. Many PDB parsers assume that there is only one model, making&lt;br /&gt;
them all but useless for NMR structures. The design of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt;&lt;br /&gt;
object makes it easy to handle PDB files with more than one model&lt;br /&gt;
(see section [[#The Structure object]]). &lt;br /&gt;
&lt;br /&gt;
==== How do I download structures from the PDB? ====&lt;br /&gt;
&lt;br /&gt;
This can be done using the &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object, using the &amp;lt;code&amp;gt;retrieve_pdb_file&amp;lt;/code&amp;gt;&lt;br /&gt;
method. The argument for this method is the PDB identifier of the&lt;br /&gt;
structure.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
pdbl = PDBList()&lt;br /&gt;
pdbl.retrieve_pdb_file('1FAT')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; class can also be used as a command-line tool: &lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
python PDBList.py 1fat&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
The downloaded file will be called &amp;lt;code&amp;gt;pdb1fat.ent&amp;lt;/code&amp;gt; and stored&lt;br /&gt;
in the current working directory. Note that the &amp;lt;code&amp;gt;retrieve_pdb_file&amp;lt;/code&amp;gt;&lt;br /&gt;
method also has an optional argument &amp;lt;code&amp;gt;pdir&amp;lt;/code&amp;gt; that specifies&lt;br /&gt;
a specific directory in which to store the downloaded PDB files. &lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;retrieve_pdb_file&amp;lt;/code&amp;gt; method also has some options for specifying&lt;br /&gt;
the compression format used for the download, and the program used&lt;br /&gt;
for local decompression (default &amp;lt;code&amp;gt;.Z&amp;lt;/code&amp;gt; format and &amp;lt;code&amp;gt;gunzip&amp;lt;/code&amp;gt;).&lt;br /&gt;
In addition, the PDB ftp site can be specified upon creation of the&lt;br /&gt;
&amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object. By default, the server of the Worldwide Protein Data Bank (\url{ftp://ftp.wwpdb.org/pub/pdb/data/structures/divided/pdb/})&lt;br /&gt;
is used. See the API documentation for more details. Thanks again&lt;br /&gt;
to Kristian Rother for donating this module.&lt;br /&gt;
&lt;br /&gt;
==== How do I download the entire PDB? ====&lt;br /&gt;
&lt;br /&gt;
The following commands will store all PDB files in the &amp;lt;code&amp;gt;/data/pdb&amp;lt;/code&amp;gt;&lt;br /&gt;
directory: &lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
python PDBList.py all /data/pdb&lt;br /&gt;
python PDBList.py all /data/pdb -d&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
The API method for this is called &amp;lt;code&amp;gt;download_entire_pdb&amp;lt;/code&amp;gt;.&lt;br /&gt;
Adding the &amp;lt;code&amp;gt;-d&amp;lt;/code&amp;gt; option will store all files in the same directory.&lt;br /&gt;
Otherwise, they are sorted into PDB-style subdirectories according&lt;br /&gt;
to their PDB ID's. Depending on the traffic, a complete download will&lt;br /&gt;
take 2-4 days.&lt;br /&gt;
&lt;br /&gt;
==== How do I keep a local copy of the PDB up-to-date? ====&lt;br /&gt;
&lt;br /&gt;
This can also be done using the &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object. One simply&lt;br /&gt;
creates a &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object (specifying the directory where&lt;br /&gt;
the local copy of the PDB is present) and calls the &amp;lt;code&amp;gt;update_pdb&amp;lt;/code&amp;gt;&lt;br /&gt;
method:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
pl = PDBList(pdb='/data/pdb')&lt;br /&gt;
pl.update_pdb()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
One can of course make a weekly cronjob out of this to keep&lt;br /&gt;
the local copy automatically up-to-date. The PDB ftp site can also&lt;br /&gt;
be specified (see API documentation).&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; has some additional methods that can be of use. The&lt;br /&gt;
&amp;lt;code&amp;gt;get_all_obsolete&amp;lt;/code&amp;gt; method can be used to get a list of all&lt;br /&gt;
obsolete PDB entries. The &amp;lt;code&amp;gt;changed_this_week&amp;lt;/code&amp;gt; method can&lt;br /&gt;
be used to obtain the entries that were added, modified or obsoleted&lt;br /&gt;
during the current week. For more info on the possibilities of &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt;,&lt;br /&gt;
see the API documentation.&lt;br /&gt;
&lt;br /&gt;
==== What about all those buggy PDB files? ====&lt;br /&gt;
&lt;br /&gt;
It is well known that many PDB files contain semantic errors (I'm&lt;br /&gt;
not talking about the structures themselves know, but their representation&lt;br /&gt;
in PDB files). Bio.PDB tries to handle this in two ways. The PDBParser&lt;br /&gt;
object can behave in two ways: a restrictive way and a permissive&lt;br /&gt;
way (THIS IS NOW THE DEFAULT). The restrictive way used to be the&lt;br /&gt;
default, but people seemed to think that Bio.PDB 'crashed' due to&lt;br /&gt;
a bug (hah!), so I changed it. If you ever encounter a real bug, please&lt;br /&gt;
tell me immediately!&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Permissive parser&lt;br /&gt;
parser = PDBParser(PERMISSIVE=1)&lt;br /&gt;
parser = PDBParser()  # The same (default)&lt;br /&gt;
# Strict parser&lt;br /&gt;
strict_parser = PDBParser(PERMISSIVE=0)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
In the permissive state (DEFAULT), PDB files that obviously contain&lt;br /&gt;
errors are 'corrected' (i.e. some residues or atoms are left out).&lt;br /&gt;
These errors include:&lt;br /&gt;
* Multiple residues with the same identifier&lt;br /&gt;
* Multiple atoms with the same identifier (taking into account the altloc identifier)&lt;br /&gt;
These errors indicate real problems in the PDB file (for details see&lt;br /&gt;
the Bioinformatics article). In the restrictive state, PDB files with&lt;br /&gt;
errors cause an exception to occur. This is useful to find errors&lt;br /&gt;
in PDB files.&lt;br /&gt;
&lt;br /&gt;
Some errors however are automatically corrected. Normally each disordered&lt;br /&gt;
atom should have a non-blank altloc identifier. However, there are&lt;br /&gt;
many structures that do not follow this convention, and have a blank&lt;br /&gt;
and a non-blank identifier for two disordered positions of the same&lt;br /&gt;
atom. This is automatically interpreted in the right way.&lt;br /&gt;
&lt;br /&gt;
Sometimes a structure contains a list of residues belonging to chain&lt;br /&gt;
A, followed by residues belonging to chain B, and again followed by&lt;br /&gt;
residues belonging to chain A, i.e. the chains are 'broken'. This&lt;br /&gt;
is also correctly interpreted.&lt;br /&gt;
&lt;br /&gt;
==== Can I write PDB files? ====&lt;br /&gt;
&lt;br /&gt;
Use the PDBIO class for this. It's easy to write out specific parts&lt;br /&gt;
of a structure too, of course.&lt;br /&gt;
&lt;br /&gt;
Example: saving a structure&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
io = PDBIO()&lt;br /&gt;
io.set_structure(s)&lt;br /&gt;
io.save('out.pdb')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
If you want to write out a part of the structure, make use of the&lt;br /&gt;
&amp;lt;code&amp;gt;Select&amp;lt;/code&amp;gt; class (also in &amp;lt;code&amp;gt;PDBIO&amp;lt;/code&amp;gt;). Select has four methods:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
accept_model(model)&lt;br /&gt;
accept_chain(chain)&lt;br /&gt;
accept_residue(residue)&lt;br /&gt;
accept_atom(atom)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
By default, every method returns 1 (which means the model/chain/residue/atom&lt;br /&gt;
is included in the output). By subclassing &amp;lt;code&amp;gt;Select&amp;lt;/code&amp;gt; and returning&lt;br /&gt;
0 when appropriate you can exclude models, chains, etc. from the output.&lt;br /&gt;
Cumbersome maybe, but very powerful. The following code only writes&lt;br /&gt;
out glycine residues:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
class GlySelect(Select):&lt;br /&gt;
    def accept_residue(self, residue):&lt;br /&gt;
        if residue.get_name()=='GLY':&lt;br /&gt;
            return 1&lt;br /&gt;
        else:&lt;br /&gt;
            return 0&lt;br /&gt;
&lt;br /&gt;
io = PDBIO()&lt;br /&gt;
io.set_structure(s)&lt;br /&gt;
io.save('gly_only.pdb', GlySelect())&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
If this is all too complicated for you, the &amp;lt;code&amp;gt;Dice&amp;lt;/code&amp;gt; module contains&lt;br /&gt;
a handy &amp;lt;code&amp;gt;extract&amp;lt;/code&amp;gt; function that writes out all residues in&lt;br /&gt;
a chain between a start and end residue.&lt;br /&gt;
&lt;br /&gt;
==== Can I write mmCIF files? ====&lt;br /&gt;
&lt;br /&gt;
No, and I also don't have plans to add that functionality soon (or&lt;br /&gt;
ever - I don't need it at all, and it's a lot of work, plus no-one&lt;br /&gt;
has ever asked for it). People who want to add this can contact me.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== The Structure object ===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==== What's the overall layout of a Structure object? ====&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object follows the so-called '''SMCRA'''&lt;br /&gt;
(Structure/Model/Chain/Residue/Atom) architecture : &lt;br /&gt;
&lt;br /&gt;
\begin{itemize&lt;br /&gt;
\item A structure consists of models&lt;br /&gt;
\item A model consists of chains&lt;br /&gt;
\item A chain consists of residues&lt;br /&gt;
\item A residue consists of atoms&lt;br /&gt;
\end{itemize&lt;br /&gt;
This is the way many structural biologists/bioinformaticians think&lt;br /&gt;
about structure, and provides a simple but efficient way to deal with&lt;br /&gt;
structure. Additional stuff is essentially added when needed. A UML&lt;br /&gt;
diagram of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object (forget about the &amp;lt;code&amp;gt;Disordered&amp;lt;/code&amp;gt;&lt;br /&gt;
classes for now) is shown in Fig. \ref{cap:SMCRA.&lt;br /&gt;
&lt;br /&gt;
[[File:Smcra.png|200px|thumb|left|alt text]] &lt;br /&gt;
&lt;br /&gt;
\caption{\label{cap:SMCRAUML diagram of SMCRA architecture of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt;&lt;br /&gt;
object. Full lines with diamonds denote aggregation, full lines with&lt;br /&gt;
arrows denote referencing, full lines with triangles denote inheritance&lt;br /&gt;
and dashed lines with triangles denote interface realization. &lt;br /&gt;
\end{figure&lt;br /&gt;
&lt;br /&gt;
==== How do I navigate through a Structure object? ====&lt;br /&gt;
&lt;br /&gt;
The following code iterates through all atoms of a structure:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
p = PDBParser()&lt;br /&gt;
structure = p.get_structure('X', 'pdb1fat.ent')&lt;br /&gt;
for model in structure:&lt;br /&gt;
    for chain in model:&lt;br /&gt;
        for residue in chain:&lt;br /&gt;
            for atom in residue:&lt;br /&gt;
                print atom&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
There are also some shortcuts:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Iterate over all atoms in a structure&lt;br /&gt;
for atom in structure.get_atoms():&lt;br /&gt;
    print atom&lt;br /&gt;
&lt;br /&gt;
# Iterate over all residues in a model&lt;br /&gt;
for residue in model.get_residues():&lt;br /&gt;
    print residue&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Structures, models, chains, residues and atoms are called '''Entities'''&lt;br /&gt;
in Biopython. You can always get a parent &amp;lt;code&amp;gt;Entity&amp;lt;/code&amp;gt; from a child&lt;br /&gt;
&amp;lt;code&amp;gt;Entity&amp;lt;/code&amp;gt;, e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
residue = atom.get_parent()&lt;br /&gt;
chain = residue.get_parent()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
You can also test whether an &amp;lt;code&amp;gt;Entity&amp;lt;/code&amp;gt; has a certain child using&lt;br /&gt;
the &amp;lt;code&amp;gt;has_id&amp;lt;/code&amp;gt; method.&lt;br /&gt;
&lt;br /&gt;
==== Can I do that a bit more conveniently? ====&lt;br /&gt;
&lt;br /&gt;
You can do things like:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
atoms = structure.get_atoms()&lt;br /&gt;
residues = structure.get_residues()&lt;br /&gt;
atoms = chain.get_atoms()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
You can also use the &amp;lt;code&amp;gt;Selection.unfold_entities&amp;lt;/code&amp;gt; function:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Get all residues from a structure&lt;br /&gt;
res_list = Selection.unfold_entities(structure, 'R')&lt;br /&gt;
# Get all atoms from a chain&lt;br /&gt;
atom_list = Selection.unfold_entities(chain, 'A')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Obviously, &amp;lt;code&amp;gt;A&amp;lt;/code&amp;gt;=atom, &amp;lt;code&amp;gt;R&amp;lt;/code&amp;gt;=residue, &amp;lt;code&amp;gt;C&amp;lt;/code&amp;gt;=chain, &amp;lt;code&amp;gt;M&amp;lt;/code&amp;gt;=model, &amp;lt;code&amp;gt;S&amp;lt;/code&amp;gt;=structure.&lt;br /&gt;
You can use this to go up in the hierarchy, e.g. to get a list of (unique) &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;Chain&amp;lt;/code&amp;gt; parents from a list of&lt;br /&gt;
&amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;s:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
residue_list = Selection.unfold_entities(atom_list, 'R')&lt;br /&gt;
chain_list = Selection.unfold_entities(atom_list, 'C')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
For more info, see the API documentation.&lt;br /&gt;
&lt;br /&gt;
==== How do I extract a specific &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;Chain&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;Model&amp;lt;/code&amp;gt; from a &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt;? ====&lt;br /&gt;
&lt;br /&gt;
Easy. Here are some examples:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
model = structure[0]&lt;br /&gt;
chain = model['A']&lt;br /&gt;
residue = chain[100]&lt;br /&gt;
atom = residue['CA']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Note that you can use a shortcut:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
atom = structure[0]['A'][100]['CA']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== What is a model id? ====&lt;br /&gt;
&lt;br /&gt;
The model id is an integer which denotes the rank of the model in&lt;br /&gt;
the PDB/mmCIF file. The model is starts at 0. Crystal structures generally&lt;br /&gt;
have only one model (with id 0), while NMR files usually have several&lt;br /&gt;
models.&lt;br /&gt;
&lt;br /&gt;
==== What is a chain id? ====&lt;br /&gt;
&lt;br /&gt;
The chain id is specified in the PDB/mmCIF file, and is a single character&lt;br /&gt;
(typically a letter). &lt;br /&gt;
&lt;br /&gt;
==== What is a residue id? ====&lt;br /&gt;
&lt;br /&gt;
This is a bit more complicated, due to the clumsy PDB format. A residue&lt;br /&gt;
id is a tuple with three elements:&lt;br /&gt;
* The '''hetero-flag''': this is &amp;lt;code&amp;gt;'H_'&amp;lt;/code&amp;gt; plus the name of the hetero-residue (e.g. &amp;lt;code&amp;gt;'H_GLC'&amp;lt;/code&amp;gt; in the case of a glucose molecule), or &amp;lt;code&amp;gt;'W'&amp;lt;/code&amp;gt; in the case of a water molecule.&lt;br /&gt;
* The '''sequence identifier''' in the chain, e.g. 100&lt;br /&gt;
* The '''insertion code''', e.g. &amp;lt;code&amp;gt;'A'&amp;lt;/code&amp;gt;. The insertion code is sometimes used to preserve a certain desirable residue numbering scheme. A Ser 80 insertion mutant (inserted e.g. between a Thr 80 and an Asn 81 residue) could e.g. have sequence identifiers and insertion codes as follows: Thr 80 A, Ser 80 B, Asn 81. In this way the residue numbering scheme stays in tune with that of the wild type structure.&lt;br /&gt;
&lt;br /&gt;
The id of the above glucose residue would thus be &amp;lt;code&amp;gt;('H_GLC', 100, 'A')&amp;lt;/code&amp;gt;. If the hetero-flag and insertion code are blank, the sequence&lt;br /&gt;
identifier alone can be used:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Full id&lt;br /&gt;
residue = chain[(' ', 100, ' ')]&lt;br /&gt;
# Shortcut id&lt;br /&gt;
residue = chain[100]&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
The reason for the hetero-flag is that many, many PDB files use the same sequence identifier for an amino acid and a hetero-residue or a water, which would create obvious problems if the hetero-flag was not used.&lt;br /&gt;
&lt;br /&gt;
==== What is an atom id? ====&lt;br /&gt;
&lt;br /&gt;
The atom id is simply the atom name (eg. &amp;lt;code&amp;gt;'CA'&amp;lt;/code&amp;gt;). In practice,&lt;br /&gt;
the atom name is created by stripping all spaces from the atom name&lt;br /&gt;
in the PDB file. &lt;br /&gt;
&lt;br /&gt;
However, in PDB files, a space can be part of an atom name. Often,&lt;br /&gt;
calcium atoms are called &amp;lt;code&amp;gt;'CA..'&amp;lt;/code&amp;gt; in order to distinguish them&lt;br /&gt;
from C&amp;amp;alpha; atoms (which are called &amp;lt;code&amp;gt;'.CA.'&amp;lt;/code&amp;gt;). In cases&lt;br /&gt;
were stripping the spaces would create problems (ie. two atoms called&lt;br /&gt;
&amp;lt;code&amp;gt;'CA'&amp;lt;/code&amp;gt; in the same residue) the spaces are kept.&lt;br /&gt;
&lt;br /&gt;
==== How is disorder handled? ====&lt;br /&gt;
&lt;br /&gt;
This is one of the strong points of Bio.PDB. It can handle both disordered&lt;br /&gt;
atoms and point mutations (ie. a Gly and an Ala residue in the same&lt;br /&gt;
position). &lt;br /&gt;
&lt;br /&gt;
Disorder should be dealt with from two points of view: the atom and&lt;br /&gt;
the residue points of view. In general, I have tried to encapsulate&lt;br /&gt;
all the complexity that arises from disorder. If you just want to&lt;br /&gt;
loop over all C&amp;amp;alpha; atoms, you do not care that some residues&lt;br /&gt;
have a disordered side chain. On the other hand it should also be&lt;br /&gt;
possible to represent disorder completely in the data structure. Therefore,&lt;br /&gt;
disordered atoms or residues are stored in special objects that behave&lt;br /&gt;
as if there is no disorder. This is done by only representing a subset&lt;br /&gt;
of the disordered atoms or residues. Which subset is picked (e.g.&lt;br /&gt;
which of the two disordered OG side chain atom positions of a Ser&lt;br /&gt;
residue is used) can be specified by the user.&lt;br /&gt;
&lt;br /&gt;
'''Disordered atom positions''' are represented by ordinary &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;&lt;br /&gt;
objects, but all &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects that represent the same physical&lt;br /&gt;
atom are stored in a &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; object (see Fig. \ref{cap:SMCRA).&lt;br /&gt;
Each &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object in a &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; object can&lt;br /&gt;
be uniquely indexed using its altloc specifier. The &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt;&lt;br /&gt;
object forwards all uncaught method calls to the selected Atom object,&lt;br /&gt;
by default the one that represents the atom with the highest&lt;br /&gt;
occupancy. The user can of course change the selected &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;&lt;br /&gt;
object, making use of its altloc specifier. In this way atom disorder&lt;br /&gt;
is represented correctly without much additional complexity. In other&lt;br /&gt;
words, if you are not interested in atom disorder, you will not be&lt;br /&gt;
bothered by it.&lt;br /&gt;
&lt;br /&gt;
Each disordered atom has a characteristic altloc identifier. You can&lt;br /&gt;
specify that a &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; object should behave like&lt;br /&gt;
the &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object associated with a specific altloc identifier:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
atom.disordered_select('A')  # select altloc A atom&lt;br /&gt;
atom.disordered_select('B')  # select altloc B atom &lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
A special case arises when disorder is due to '''point mutations''',&lt;br /&gt;
i.e. when two or more point mutants of a polypeptide are present in&lt;br /&gt;
the crystal. An example of this can be found in PDB structure 1EN2.&lt;br /&gt;
&lt;br /&gt;
Since these residues belong to a different residue type (e.g. let's&lt;br /&gt;
say Ser 60 and Cys 60) they should not be stored in a single &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt;&lt;br /&gt;
object as in the common case. In this case, each residue is represented&lt;br /&gt;
by one &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object, and both &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; objects&lt;br /&gt;
are stored in a single &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt; object (see Fig.&lt;br /&gt;
\ref{cap:SMCRA).&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt; object forwards all uncaught&lt;br /&gt;
methods to the selected &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object (by default the last&lt;br /&gt;
&amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object added), and thus behaves like an ordinary&lt;br /&gt;
residue. Each &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object in a &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt;&lt;br /&gt;
object can be uniquely identified by its residue name. In the above&lt;br /&gt;
example, residue Ser 60 would have id 'SER' in the &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt;&lt;br /&gt;
object, while residue Cys 60 would have id 'CYS'. The user can select&lt;br /&gt;
the active &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object in a &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt;&lt;br /&gt;
object via this id.&lt;br /&gt;
&lt;br /&gt;
Example: suppose that a chain has a point mutation at position 10,&lt;br /&gt;
consisting of a Ser and a Cys residue. Make sure that residue 10 of&lt;br /&gt;
this chain behaves as the Cys residue.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
residue = chain[10]&lt;br /&gt;
residue.disordered_select('CYS')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
In addition, you can get a list of all &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects (ie.&lt;br /&gt;
all &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; objects are 'unpacked' to their individual&lt;br /&gt;
&amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects) using the &amp;lt;code&amp;gt;get_unpacked_list&amp;lt;/code&amp;gt; method&lt;br /&gt;
of a (&amp;lt;code&amp;gt;Disordered&amp;lt;/code&amp;gt;)&amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object.&lt;br /&gt;
&lt;br /&gt;
==== Can I sort residues in a chain somehow? ====&lt;br /&gt;
&lt;br /&gt;
Yes, kinda, but I'm waiting for a request for this feature to finish&lt;br /&gt;
it :-).&lt;br /&gt;
&lt;br /&gt;
==== How are ligands and solvent handled? ====&lt;br /&gt;
&lt;br /&gt;
See 'What is a residue id?'.&lt;br /&gt;
&lt;br /&gt;
==== What about B factors? ====&lt;br /&gt;
&lt;br /&gt;
Well, yes! Bio.PDB supports isotropic and anisotropic B factors, and&lt;br /&gt;
also deals with standard deviations of anisotropic B factor if present&lt;br /&gt;
(see the section [[#Analysis]]).&lt;br /&gt;
&lt;br /&gt;
==== What about standard deviation of atomic positions? ====&lt;br /&gt;
&lt;br /&gt;
Yup, supported. See the section [[#Analysis]].&lt;br /&gt;
&lt;br /&gt;
==== I think the SMCRA data structure is not flexible/sexy/whatever enough... ====&lt;br /&gt;
&lt;br /&gt;
Sure, sure. Everybody is always coming up with (mostly vaporware or&lt;br /&gt;
partly implemented) data structures that handle all possible situations&lt;br /&gt;
and are extensible in all thinkable (and unthinkable) ways. The prosaic&lt;br /&gt;
truth however is that 99.9\% of people using (and I mean really using!)&lt;br /&gt;
crystal structures think in terms of models, chains, residues and&lt;br /&gt;
atoms. The philosophy of Bio.PDB is to provide a reasonably fast,&lt;br /&gt;
clean, simple, but complete data structure to access structure data.&lt;br /&gt;
The proof of the pudding is in the eating.&lt;br /&gt;
&lt;br /&gt;
Moreover, it is quite easy to build more specialised data structures&lt;br /&gt;
on top of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; class (eg. there's a &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt;&lt;br /&gt;
class). On the other hand, the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object is built&lt;br /&gt;
using a Parser/Consumer approach (called &amp;lt;code&amp;gt;PDBParser&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;MMCIFParser&amp;lt;/code&amp;gt;&lt;br /&gt;
and &amp;lt;code&amp;gt;StructureBuilder&amp;lt;/code&amp;gt;, respectively). One can easily reuse&lt;br /&gt;
the PDB/mmCIF parsers by implementing a specialised &amp;lt;code&amp;gt;StructureBuilder&amp;lt;/code&amp;gt;&lt;br /&gt;
class. It is of course also trivial to add support for new file formats&lt;br /&gt;
by writing new parsers.&lt;br /&gt;
&lt;br /&gt;
=== Analysis ===&lt;br /&gt;
&lt;br /&gt;
==== How do I extract information from an &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object? ====&lt;br /&gt;
&lt;br /&gt;
Using the following methods:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
a.get_name()           # atom name (spaces stripped, e.g. 'CA')&lt;br /&gt;
a.get_id()             # id (equals atom name)&lt;br /&gt;
a.get_coord()          # atomic coordinates&lt;br /&gt;
a.get_vector()         # atomic coordinates as Vector object&lt;br /&gt;
a.get_bfactor()        # isotropic B factor&lt;br /&gt;
a.get_occupancy()      # occupancy&lt;br /&gt;
a.get_altloc()         # alternative location specifier&lt;br /&gt;
a.get_sigatm()         # std. dev. of atomic parameters&lt;br /&gt;
a.get_siguij()         # std. dev. of anisotropic B factor&lt;br /&gt;
a.get_anisou()         # anisotropic B factor&lt;br /&gt;
a.get_fullname()       # atom name (with spaces, e.g. '.CA.')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I extract information from a &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object? ====&lt;br /&gt;
&lt;br /&gt;
Using the following methods:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
r.get_resname()         # return the residue name (eg. 'GLY')&lt;br /&gt;
r.is_disordered()       # 1 if the residue has disordered atoms&lt;br /&gt;
r.get_segid()           # return the SEGID&lt;br /&gt;
r.has_id(name)          # test if a residue has a certain atom&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I measure distances? ====&lt;br /&gt;
&lt;br /&gt;
That's simple: the minus operator for atoms has been overloaded to&lt;br /&gt;
return the distance between two atoms. &lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Get some atoms&lt;br /&gt;
ca1 = residue1['CA']&lt;br /&gt;
ca2 = residue2['CA']&lt;br /&gt;
# Simply subtract the atoms to get their distance&lt;br /&gt;
distance = ca1-ca2&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I measure angles? ====&lt;br /&gt;
&lt;br /&gt;
This can easily be done via the vector representation of the atomic coordinates, and the &amp;lt;code&amp;gt;calc_angle&amp;lt;/code&amp;gt; function from the &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; module:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
vector1 = atom1.get_vector()&lt;br /&gt;
vector2 = atom2.get_vector()&lt;br /&gt;
vector3 = atom3.get_vector()&lt;br /&gt;
angle = calc_angle(vector1, vector2, vector3)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I measure torsion angles? ====&lt;br /&gt;
&lt;br /&gt;
Again, this can easily be done via the vector representation of the&lt;br /&gt;
atomic coordinates, this time using the &amp;lt;code&amp;gt;calc_dihedral&amp;lt;/code&amp;gt; function&lt;br /&gt;
from the &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; module:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
vector1 = atom1.get_vector()&lt;br /&gt;
vector2 = atom2.get_vector()&lt;br /&gt;
vector3 = atom3.get_vector()&lt;br /&gt;
vector4 = atom4.get_vector()&lt;br /&gt;
angle = calc_dihedral(vector1, vector2, vector3, vector4)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I determine atom-atom contacts? ====&lt;br /&gt;
&lt;br /&gt;
Use &amp;lt;code&amp;gt;NeighborSearch&amp;lt;/code&amp;gt;. This uses a KD tree data structure coded&lt;br /&gt;
in C behind the screens, so it's pretty darn fast (see &amp;lt;code&amp;gt;Bio.KDTree&amp;lt;/code&amp;gt;).&lt;br /&gt;
&lt;br /&gt;
==== How do I extract polypeptides from a &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object? ====&lt;br /&gt;
&lt;br /&gt;
Use &amp;lt;code&amp;gt;PolypeptideBuilder&amp;lt;/code&amp;gt;. You can use the resulting &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt;&lt;br /&gt;
object to get the sequence as a &amp;lt;code&amp;gt;Seq&amp;lt;/code&amp;gt; object or to get a list&lt;br /&gt;
of C&amp;amp;alpha; atoms as well. Polypeptides can be built using a C-N&lt;br /&gt;
or a C&amp;amp;alpha;-C&amp;amp;alpha; distance criterion.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Using C-N &lt;br /&gt;
ppb=PPBuilder()&lt;br /&gt;
for pp in ppb.build_peptides(structure): &lt;br /&gt;
    print pp.get_sequence()&lt;br /&gt;
&lt;br /&gt;
# Using CA-CA&lt;br /&gt;
ppb=CaPPBuilder()&lt;br /&gt;
for pp in ppb.build_peptides(structure): &lt;br /&gt;
    print pp.get_sequence()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that in the above case only model 0 of the structure is considered&lt;br /&gt;
by &amp;lt;code&amp;gt;PolypeptideBuilder&amp;lt;/code&amp;gt;. However, it is possible to use &amp;lt;code&amp;gt;PolypeptideBuilder&amp;lt;/code&amp;gt;&lt;br /&gt;
to build &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt; objects from &amp;lt;code&amp;gt;Model&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;Chain&amp;lt;/code&amp;gt;&lt;br /&gt;
objects as well.&lt;br /&gt;
&lt;br /&gt;
==== How do I get the sequence of a structure? ====&lt;br /&gt;
&lt;br /&gt;
The first thing to do is to extract all polypeptides from the structure&lt;br /&gt;
(see previous entry). The sequence of each polypeptide can then easily&lt;br /&gt;
be obtained from the &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt; objects. The sequence is&lt;br /&gt;
represented as a Biopython &amp;lt;code&amp;gt;Seq&amp;lt;/code&amp;gt; object, and its alphabet is&lt;br /&gt;
defined by a &amp;lt;code&amp;gt;ProteinAlphabet&amp;lt;/code&amp;gt; object.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; seq = polypeptide.get_sequence()&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print seq&lt;br /&gt;
Seq('SNVVE...', &amp;lt;class Bio.Alphabet.ProteinAlphabet&amp;gt;)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I determine secondary structure? ====&lt;br /&gt;
&lt;br /&gt;
For this functionality, you need to install DSSP (and obtain a license&lt;br /&gt;
for it - free for academic use, see http://www.cmbi.kun.nl/gv/dssp/).&lt;br /&gt;
Then use the &amp;lt;code&amp;gt;DSSP&amp;lt;/code&amp;gt; class, which maps &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; objects&lt;br /&gt;
to their secondary structure (and accessible surface area). The DSSP&lt;br /&gt;
codes are listed in Table \ref{cap:DSSP-codes. Note that DSSP (the&lt;br /&gt;
program, and thus by consequence the class) cannot handle multiple&lt;br /&gt;
models!&lt;br /&gt;
&lt;br /&gt;
%&lt;br /&gt;
\begin{table&lt;br /&gt;
&lt;br /&gt;
==== \begin{tabular{|c|c|&lt;br /&gt;
\hline &lt;br /&gt;
Code&amp;amp;&lt;br /&gt;
Secondary structure\tabularnewline&lt;br /&gt;
\hline&lt;br /&gt;
\hline &lt;br /&gt;
H&amp;amp;&lt;br /&gt;
&amp;amp;alpha;-helix\tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
B&amp;amp;&lt;br /&gt;
Isolated &amp;amp;beta;-bridge residue\tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
E&amp;amp;&lt;br /&gt;
Strand \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
G&amp;amp;&lt;br /&gt;
3-10 helix \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
I&amp;amp;&lt;br /&gt;
$\Pi$-helix \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
T&amp;amp;&lt;br /&gt;
Turn\tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
S&amp;amp;&lt;br /&gt;
Bend \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
-&amp;amp;&lt;br /&gt;
Other\tabularnewline&lt;br /&gt;
\hline&lt;br /&gt;
\end{tabular&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
\caption{\label{cap:DSSP-codesDSSP codes in Bio.PDB.&lt;br /&gt;
\end{table&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==== How do I calculate the accessible surface area of a residue? ====&lt;br /&gt;
&lt;br /&gt;
Use the &amp;lt;code&amp;gt;DSSP&amp;lt;/code&amp;gt; class (see also previous entry). But see also&lt;br /&gt;
next entry.&lt;br /&gt;
&lt;br /&gt;
==== How do I calculate residue depth? ====&lt;br /&gt;
&lt;br /&gt;
Residue depth is the average distance of a residue's atoms from the&lt;br /&gt;
solvent accessible surface. It's a fairly new and very powerful parameterization&lt;br /&gt;
of solvent accessibility. For this functionality, you need to install&lt;br /&gt;
Michel Sanner's MSMS program (http://www.scripps.edu/pub/olson-web/people/sanner/html/msms_home.html).&lt;br /&gt;
Then use the &amp;lt;code&amp;gt;ResidueDepth&amp;lt;/code&amp;gt; class. This class behaves as a&lt;br /&gt;
dictionary which maps &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; objects to corresponding (residue&lt;br /&gt;
depth, C&amp;amp;alpha; depth) tuples. The C&amp;amp;alpha; depth is the distance&lt;br /&gt;
of a residue's C&amp;amp;alpha; atom to the solvent accessible surface. &lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
model = structure[0]&lt;br /&gt;
rd = ResidueDepth(model, pdb_file)&lt;br /&gt;
residue_depth, ca_depth = rd[some_residue]&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
You can also get access to the molecular surface itself (via the &amp;lt;code&amp;gt;get_surface&amp;lt;/code&amp;gt;&lt;br /&gt;
function), in the form of a Numeric python array with the surface points.&lt;br /&gt;
&lt;br /&gt;
==== How do I calculate Half Sphere Exposure? ====&lt;br /&gt;
&lt;br /&gt;
Half Sphere Exposure (HSE) is a new, 2D measure of solvent exposure.&lt;br /&gt;
Basically, it counts the number of C&amp;amp;alpha; atoms around a residue&lt;br /&gt;
in the direction of its side chain, and in the opposite direction&lt;br /&gt;
(within a radius of 13 Å). Despite its simplicity, it outperforms&lt;br /&gt;
many other measures of solvent exposure. An article describing this&lt;br /&gt;
novel 2D measure has been submitted.&lt;br /&gt;
&lt;br /&gt;
HSE comes in two flavors: HSE&amp;amp;alpha; and HSE&amp;amp;beta;. The former&lt;br /&gt;
only uses the C&amp;amp;alpha; atom positions, while the latter uses the&lt;br /&gt;
C&amp;amp;alpha; and C&amp;amp;beta; atom positions. The HSE measure is calculated&lt;br /&gt;
by the &amp;lt;code&amp;gt;HSExposure&amp;lt;/code&amp;gt; class, which can also calculate the contact&lt;br /&gt;
number. The latter class has methods which return dictionaries that&lt;br /&gt;
map a &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object to its corresponding HSE&amp;amp;alpha;, HSE&amp;amp;beta;&lt;br /&gt;
and contact number values.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
model = structure[0]&lt;br /&gt;
hse = HSExposure()&lt;br /&gt;
# Calculate HSEalpha&lt;br /&gt;
exp_ca = hse.calc_hs_exposure(model, option='CA3')&lt;br /&gt;
# Calculate HSEbeta&lt;br /&gt;
exp_cb = hse.calc_hs_exposure(model, option='CB')&lt;br /&gt;
# Calculate classical coordination number&lt;br /&gt;
exp_fs = hse.calc_fs_exposure(model)&lt;br /&gt;
# Print HSEalpha for a residue&lt;br /&gt;
print exp_ca[some_residue]&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I map the residues of two related structures onto each other? ====&lt;br /&gt;
&lt;br /&gt;
First, create an alignment file in FASTA format, then use the &amp;lt;code&amp;gt;StructureAlignment&amp;lt;/code&amp;gt;&lt;br /&gt;
class. This class can also be used for alignments with more than two&lt;br /&gt;
structures.&lt;br /&gt;
&lt;br /&gt;
==== How do I test if a Residue object is an amino acid? ====&lt;br /&gt;
&lt;br /&gt;
Use &amp;lt;code&amp;gt;is_aa(residue)&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
==== Can I do vector operations on atomic coordinates? ====&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects return a &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; object representation&lt;br /&gt;
of the coordinates with the &amp;lt;code&amp;gt;get_vector&amp;lt;/code&amp;gt; method. &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt;&lt;br /&gt;
implements the full set of 3D vector operations, matrix multiplication&lt;br /&gt;
(left and right) and some advanced rotation-related operations as&lt;br /&gt;
well. See also next question.&lt;br /&gt;
&lt;br /&gt;
==== How do I put a virtual C&amp;amp;beta; on a Gly residue? ====&lt;br /&gt;
&lt;br /&gt;
OK, I admit, this example is only present to show off the possibilities&lt;br /&gt;
of Bio.PDB's &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; module (though this code is actually&lt;br /&gt;
used in the &amp;lt;code&amp;gt;HSExposure&amp;lt;/code&amp;gt; module, which contains a novel way&lt;br /&gt;
to parametrize residue exposure - publication underway). Suppose that&lt;br /&gt;
you would like to find the position of a Gly residue's C&amp;amp;beta; atom,&lt;br /&gt;
if it had one. How would you do that? Well, rotating the N atom of&lt;br /&gt;
the Gly residue along the C&amp;amp;alpha;-C bond over -120 degrees roughly&lt;br /&gt;
puts it in the position of a virtual C&amp;amp;beta; atom. Here's how to&lt;br /&gt;
do it, making use of the &amp;lt;code&amp;gt;rotaxis&amp;lt;/code&amp;gt; method (which can be used&lt;br /&gt;
to construct a rotation around a certain axis) of the &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt;&lt;br /&gt;
module:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# get atom coordinates as vectors&lt;br /&gt;
n = residue['N'].get_vector() &lt;br /&gt;
c = residue['C'].get_vector() &lt;br /&gt;
ca = residue['CA'].get_vector()&lt;br /&gt;
# center at origin&lt;br /&gt;
n = n - ca&lt;br /&gt;
c = c - ca&lt;br /&gt;
# find rotation matrix that rotates n -120 degrees along the ca-c vector&lt;br /&gt;
rot = rotaxis(-pi{*120.0/180.0, c)&lt;br /&gt;
# apply rotation to ca-n vector&lt;br /&gt;
cb_at_origin = n.left_multiply(rot)&lt;br /&gt;
# put on top of ca atom&lt;br /&gt;
cb = cb_at_origin + ca&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
This example shows that it's possible to do some quite nontrivial&lt;br /&gt;
vector operations on atomic data, which can be quite useful. In addition&lt;br /&gt;
to all the usual vector operations (cross (use &amp;lt;code&amp;gt;**&amp;lt;/code&amp;gt;), and&lt;br /&gt;
dot (use &amp;lt;code&amp;gt;*&amp;lt;/code&amp;gt;) product, angle, norm, etc.) and the above mentioned&lt;br /&gt;
&amp;lt;code&amp;gt;rotaxis function, the &amp;lt;code&amp;gt;Vector module also has methods&lt;br /&gt;
to rotate (&amp;lt;code&amp;gt;rotmat&amp;lt;/code&amp;gt;) or reflect (&amp;lt;code&amp;gt;refmat&amp;lt;/code&amp;gt;) one vector&lt;br /&gt;
on top of another.&lt;br /&gt;
&lt;br /&gt;
=== Manipulating the structure ===&lt;br /&gt;
&lt;br /&gt;
==== How do I superimpose two structures? ====&lt;br /&gt;
&lt;br /&gt;
Surprisingly, this is done using the &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; object.&lt;br /&gt;
This object calculates the rotation and translation matrix that rotates&lt;br /&gt;
two lists of atoms on top of each other in such a way that their RMSD&lt;br /&gt;
is minimized. Of course, the two lists need to contain the same amount&lt;br /&gt;
of atoms. The &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; object can also apply the rotation/translation&lt;br /&gt;
to a list of atoms. The rotation and translation are stored as a tuple&lt;br /&gt;
in the &amp;lt;code&amp;gt;rotran&amp;lt;/code&amp;gt; attribute of the &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; object&lt;br /&gt;
(note that the rotation is right multiplying!). The RMSD is stored&lt;br /&gt;
in the &amp;lt;code&amp;gt;rmsd&amp;lt;/code&amp;gt; attribute.&lt;br /&gt;
&lt;br /&gt;
The algorithm used by &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; comes from \textit{Matrix&lt;br /&gt;
computations, 2nd ed. Golub, G. \&amp;amp; Van Loan (1989) and makes use&lt;br /&gt;
of singular value decomposition (this is implemented in the general&lt;br /&gt;
&amp;lt;code&amp;gt;Bio.SVDSuperimposer&amp;lt;/code&amp;gt; module).&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
sup = Superimposer()&lt;br /&gt;
# Specify the atom lists&lt;br /&gt;
# 'fixed' and 'moving' are lists of Atom objects&lt;br /&gt;
# The moving atoms will be put on the fixed atoms&lt;br /&gt;
sup.set_atoms(fixed, moving)&lt;br /&gt;
# Print rotation/translation/rmsd&lt;br /&gt;
print sup.rotran&lt;br /&gt;
print sup.rms &lt;br /&gt;
# Apply rotation/translation to the moving atoms&lt;br /&gt;
sup.apply(moving)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I superimpose two structures based on their active sites? ====&lt;br /&gt;
&lt;br /&gt;
Pretty easily. Use the active site atoms to calculate the rotation/translation&lt;br /&gt;
matrices (see above), and apply these to the whole molecule.&lt;br /&gt;
&lt;br /&gt;
==== Can I manipulate the atomic coordinates? ====&lt;br /&gt;
&lt;br /&gt;
Yes, using the &amp;lt;code&amp;gt;transform&amp;lt;/code&amp;gt; method of the &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object,&lt;br /&gt;
or directly using the &amp;lt;code&amp;gt;set_coord&amp;lt;/code&amp;gt; method.&lt;br /&gt;
&lt;br /&gt;
== Other Structural Bioinformatics modules ==&lt;br /&gt;
&lt;br /&gt;
==== Bio.SCOP ====&lt;br /&gt;
&lt;br /&gt;
See the main Biopython tutorial.&lt;br /&gt;
&lt;br /&gt;
==== Bio.FSSP ====&lt;br /&gt;
&lt;br /&gt;
No documentation available yet.&lt;br /&gt;
&lt;br /&gt;
== You haven't answered my question yet! ==&lt;br /&gt;
&lt;br /&gt;
Woah! It's late and I'm tired, and a glass of excellent ''Pedro Ximenez'' sherry is waiting for me. Just drop me a mail, and I'll answer&lt;br /&gt;
you in the morning (with a bit of luck...).&lt;br /&gt;
&lt;br /&gt;
== Contributors ==&lt;br /&gt;
&lt;br /&gt;
The main author/maintainer of Bio.PDB is:&lt;br /&gt;
&lt;br /&gt;
 Thomas Hamelryck&lt;br /&gt;
 Bioinformatics center&lt;br /&gt;
 Institute of Molecular Biology&lt;br /&gt;
 University of Copenhagen&lt;br /&gt;
 Universitetsparken 15, Bygning 10&lt;br /&gt;
 DK-2100 København Ø&lt;br /&gt;
 Denmark&lt;br /&gt;
 thamelry@binf.ku.dk&lt;br /&gt;
&lt;br /&gt;
Kristian Rother donated code to interact with the PDB database, and to parse the PDB&lt;br /&gt;
header. Indraneel Majumdar sent in some bug reports and assisted in&lt;br /&gt;
coding the &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt; module. Many thanks to Brad Chapman,&lt;br /&gt;
Jeffrey Chang, Andrew Dalke and Iddo Friedberg for suggestions, comments,&lt;br /&gt;
help and/or biting criticism :-).&lt;br /&gt;
&lt;br /&gt;
== Can I contribute? ==&lt;br /&gt;
&lt;br /&gt;
Yes, yes, yes! Just send an e-mail to [mailto:thamelry@binf.ku.dk Thomas Hamelryck] or to [mailto:biopython-dev@biopython.org the Biopython developers] if you have something useful to contribute! Eternal fame awaits!&lt;/div&gt;</summary>
		<author><name>Mdehoon</name></author>	</entry>

	<entry>
		<id>http://www.biopython.org/wiki/File:Smcra.png</id>
		<title>File:Smcra.png</title>
		<link rel="alternate" type="text/html" href="http://www.biopython.org/wiki/File:Smcra.png"/>
				<updated>2013-03-26T07:57:41Z</updated>
		
		<summary type="html">&lt;p&gt;Mdehoon: Structure/Model/Chain/Residue/Atom
from Bio.PDB&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;Structure/Model/Chain/Residue/Atom&lt;br /&gt;
from Bio.PDB&lt;/div&gt;</summary>
		<author><name>Mdehoon</name></author>	</entry>

	<entry>
		<id>http://www.biopython.org/wiki/The_Biopython_Structural_Bioinformatics_FAQ</id>
		<title>The Biopython Structural Bioinformatics FAQ</title>
		<link rel="alternate" type="text/html" href="http://www.biopython.org/wiki/The_Biopython_Structural_Bioinformatics_FAQ"/>
				<updated>2013-03-26T03:38:57Z</updated>
		
		<summary type="html">&lt;p&gt;Mdehoon: /* How do I measure torsion angles? */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Introduction ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB is a biopython module that focuses on working with crystal&lt;br /&gt;
structures of biological macromolecules. This document gives a fairly&lt;br /&gt;
complete overview of Bio.PDB.&lt;br /&gt;
&lt;br /&gt;
== Bio.PDB's installation ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB is automatically installed as part of Biopython. Biopython&lt;br /&gt;
can be obtained from http://www.biopython.org. It runs on many&lt;br /&gt;
platforms (Linux/Unix, windows, Mac,...).&lt;br /&gt;
&lt;br /&gt;
== Who's using Bio.PDB? ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB was used in the construction of DISEMBL, a web server that&lt;br /&gt;
predicts disordered regions in proteins (http://dis.embl.de/),&lt;br /&gt;
and COLUMBA, a website that provides annotated protein structures&lt;br /&gt;
(http://www.columba-db.de/). Bio.PDB has also been used to&lt;br /&gt;
perform a large scale search for active sites similarities between&lt;br /&gt;
protein structures in the PDB (see [http://dx.doi.org/10.1002/prot.10338 \textit{Proteins Struct. Func.&lt;br /&gt;
Gen., \textbf{2003, 51, 96-108]) , and to develop a new algorithm&lt;br /&gt;
that identifies linear secondary structure elements ([\emph{BMC Bioinformatics,&lt;br /&gt;
\textbf{2005, 6, 202 http://www.biomedcentral.com/1471-2105/6/202]).&lt;br /&gt;
&lt;br /&gt;
Judging from requests for features and information, Bio.PDB is also&lt;br /&gt;
used by several LPCs (Large Pharmaceutical Companies :-).&lt;br /&gt;
&lt;br /&gt;
== Is there a Bio.PDB reference? ==&lt;br /&gt;
&lt;br /&gt;
Yes, and I'd appreciate it if you would refer to Bio.PDB in publications&lt;br /&gt;
if you make use of it. The reference is:&lt;br /&gt;
&lt;br /&gt;
\begin{quote&lt;br /&gt;
Hamelryck, T., Manderick, B. (2003) PDB parser and structure class&lt;br /&gt;
implemented in Python. \textit{Bioinformatics, \textbf{19, 2308-2310. &lt;br /&gt;
\end{quote&lt;br /&gt;
The article can be freely downloaded via the Bioinformatics journal&lt;br /&gt;
website (http://www.binf.ku.dk/users/thamelry/references.html).&lt;br /&gt;
I welcome e-mails telling me what you are using Bio.PDB for. Feature&lt;br /&gt;
requests are welcome too.&lt;br /&gt;
&lt;br /&gt;
== How well tested is Bio.PDB? ==&lt;br /&gt;
&lt;br /&gt;
Pretty well, actually. Bio.PDB has been extensively tested on nearly&lt;br /&gt;
5500 structures from the PDB - all structures seemed to be parsed&lt;br /&gt;
correctly. More details can be found in the Bio.PDB Bioinformatics&lt;br /&gt;
article. Bio.PDB has been used/is being used in many research projects&lt;br /&gt;
as a reliable tool. In fact, I'm using Bio.PDB almost daily for research&lt;br /&gt;
purposes and continue working on improving it and adding new features.&lt;br /&gt;
&lt;br /&gt;
== How fast is it? ==&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;PDBParser&amp;lt;/code&amp;gt; performance was tested on about 800 structures&lt;br /&gt;
(each belonging to a unique SCOP superfamily). This takes about 20&lt;br /&gt;
minutes, or on average 1.5 seconds per structure. Parsing the structure&lt;br /&gt;
of the large ribosomal subunit (1FKK), which contains about 64000&lt;br /&gt;
atoms, takes 10 seconds on a 1000 MHz PC. In short: it's more than&lt;br /&gt;
fast enough for many applications.&lt;br /&gt;
&lt;br /&gt;
== Why should I use Bio.PDB? ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB might be exactly what you want, and then again it might not.&lt;br /&gt;
If you are interested in data mining the PDB header, you might want&lt;br /&gt;
to look elsewhere because there is only limited support for this.&lt;br /&gt;
If you look for a powerful, complete data structure to access the&lt;br /&gt;
atomic data Bio.PDB is probably for you. &lt;br /&gt;
&lt;br /&gt;
== Usage ==&lt;br /&gt;
&lt;br /&gt;
=== General questions ===&lt;br /&gt;
&lt;br /&gt;
==== Importing Bio.PDB ====&lt;br /&gt;
&lt;br /&gt;
That's simple:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
from Bio.PDB import *&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Is there support for molecular graphics? ====&lt;br /&gt;
&lt;br /&gt;
Not directly, mostly since there are quite a few Python based/Python&lt;br /&gt;
aware solutions already, that can potentially be used with Bio.PDB.&lt;br /&gt;
My choice is Pymol, BTW (I've used this successfully with Bio.PDB,&lt;br /&gt;
and there will probably be specific PyMol modules in Bio.PDB soon/some&lt;br /&gt;
day). Python based/aware molecular graphics solutions include:&lt;br /&gt;
&lt;br /&gt;
* PyMol: http://pymol.sourceforge.net/&lt;br /&gt;
* Chimera: http://www.cgl.ucsf.edu/chimera/&lt;br /&gt;
* PMV: http://www.scripps.edu/ sanner/python/&lt;br /&gt;
* Coot: http://www.ysbl.york.ac.uk/ emsley/coot/&lt;br /&gt;
* CCP4mg: http://www.ysbl.york.ac.uk/ lizp/molgraphics.html&lt;br /&gt;
* mmLib: http://pymmlib.sourceforge.net/ &lt;br /&gt;
* VMD: http://www.ks.uiuc.edu/Research/vmd/&lt;br /&gt;
* MMTK: http://starship.python.net/crew/hinsen/MMTK/&lt;br /&gt;
&lt;br /&gt;
I'd be crazy to write another molecular graphics application (been&lt;br /&gt;
there - done that, actually :-).&lt;br /&gt;
&lt;br /&gt;
=== Input/output ===&lt;br /&gt;
&lt;br /&gt;
==== How do I create a structure object from a PDB file? ====&lt;br /&gt;
&lt;br /&gt;
First, create a &amp;lt;code&amp;gt;PDBParser&amp;lt;/code&amp;gt; object:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
parser = PDBParser()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then, create a structure object from a PDB file in the following way&lt;br /&gt;
(the PDB file in this case is called '1FAT.pdb', 'PHA-L' is a user&lt;br /&gt;
defined name for the structure):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
structure = parser.get_structure('PHA-L', '1FAT.pdb')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I create a structure object from an mmCIF file? ====&lt;br /&gt;
&lt;br /&gt;
Similarly to the case the case of PDB files, first create an &amp;lt;code&amp;gt;MMCIFParser&amp;lt;/code&amp;gt;&lt;br /&gt;
object:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
parser = MMCIFParser()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Then use this parser to create a structure object from the mmCIF file:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
structure = parser.get_structure('PHA-L', '1FAT.cif')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== ...and what about the new PDB XML format? ====&lt;br /&gt;
&lt;br /&gt;
That's not yet supported, but I'm definitely planning to support that&lt;br /&gt;
in the future (it's not a lot of work). Contact me if you need this,&lt;br /&gt;
it might encourage me :-).&lt;br /&gt;
&lt;br /&gt;
==== I'd like to have some more low level access to an mmCIF file... ====&lt;br /&gt;
&lt;br /&gt;
You got it. You can create a python dictionary that maps all mmCIF&lt;br /&gt;
tags in an mmCIF file to their values. If there are multiple values&lt;br /&gt;
(like in the case of tag &amp;lt;code&amp;gt;_atom_site.Cartn_y&amp;lt;/code&amp;gt;, which holds&lt;br /&gt;
the ''y'' coordinates of all atoms), the tag is mapped to a list of values.&lt;br /&gt;
The dictionary is created from the mmCIF file as follows:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
mmcif_dict = MMCIF2Dict('1FAT.cif')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example: get the solvent content from an mmCIF file:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
sc = mmcif_dict['_exptl_crystal.density_percent_sol']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example: get the list of the ''y'' coordinates of all atoms&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
y_list = mmcif_dict['_atom_site.Cartn_y']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Can I access the header information? ====&lt;br /&gt;
&lt;br /&gt;
Thanks to Christian Rother you can access some information from the&lt;br /&gt;
PDB header. Note however that many PDB files contain headers with&lt;br /&gt;
incomplete or erroneous information. Many of the errors have been&lt;br /&gt;
fixed in the equivalent mmCIF files.&lt;br /&gt;
'''Hence, if you are interested in the header information, it is a good idea to extract information from mmCIF files using the &amp;lt;code&amp;gt;MMCIF2Dict&amp;lt;/code&amp;gt; tool described above, instead of parsing the PDB header.''' &lt;br /&gt;
&lt;br /&gt;
Now that is clarified, let's return to parsing the PDB header. The&lt;br /&gt;
structure object has an attribute called &amp;lt;code&amp;gt;header&amp;lt;/code&amp;gt; which is&lt;br /&gt;
a python dictionary that maps header records to their values.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
resolution = structure.header['resolution']&lt;br /&gt;
keywords = structure.header['keywords']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The available keys are &amp;lt;code&amp;gt;name&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;head&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;deposition_date&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;release_date&amp;lt;/code&amp;gt;,&lt;br /&gt;
&amp;lt;code&amp;gt;structure_method&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;resolution&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;structure_reference&amp;lt;/code&amp;gt; (maps to&lt;br /&gt;
a list of references), &amp;lt;code&amp;gt;journal_reference&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;author&amp;lt;/code&amp;gt; and&lt;br /&gt;
&amp;lt;code&amp;gt;compound&amp;lt;/code&amp;gt; (maps to a dictionary with various information about&lt;br /&gt;
the crystallized compound).&lt;br /&gt;
&lt;br /&gt;
The dictionary can also be created without creating a &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt;&lt;br /&gt;
object, ie. directly from the PDB file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
handle = open(filename,'r')&lt;br /&gt;
header_dict = parse_pdb_header(handle)&lt;br /&gt;
handle.close()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Can I use Bio.PDB with NMR structures (ie. with more than one model)? ====&lt;br /&gt;
&lt;br /&gt;
Sure. Many PDB parsers assume that there is only one model, making&lt;br /&gt;
them all but useless for NMR structures. The design of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt;&lt;br /&gt;
object makes it easy to handle PDB files with more than one model&lt;br /&gt;
(see section [[#The Structure object]]). &lt;br /&gt;
&lt;br /&gt;
==== How do I download structures from the PDB? ====&lt;br /&gt;
&lt;br /&gt;
This can be done using the &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object, using the &amp;lt;code&amp;gt;retrieve_pdb_file&amp;lt;/code&amp;gt;&lt;br /&gt;
method. The argument for this method is the PDB identifier of the&lt;br /&gt;
structure.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
pdbl = PDBList()&lt;br /&gt;
pdbl.retrieve_pdb_file('1FAT')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; class can also be used as a command-line tool: &lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
python PDBList.py 1fat&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
The downloaded file will be called &amp;lt;code&amp;gt;pdb1fat.ent&amp;lt;/code&amp;gt; and stored&lt;br /&gt;
in the current working directory. Note that the &amp;lt;code&amp;gt;retrieve_pdb_file&amp;lt;/code&amp;gt;&lt;br /&gt;
method also has an optional argument &amp;lt;code&amp;gt;pdir&amp;lt;/code&amp;gt; that specifies&lt;br /&gt;
a specific directory in which to store the downloaded PDB files. &lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;retrieve_pdb_file&amp;lt;/code&amp;gt; method also has some options for specifying&lt;br /&gt;
the compression format used for the download, and the program used&lt;br /&gt;
for local decompression (default &amp;lt;code&amp;gt;.Z&amp;lt;/code&amp;gt; format and &amp;lt;code&amp;gt;gunzip&amp;lt;/code&amp;gt;).&lt;br /&gt;
In addition, the PDB ftp site can be specified upon creation of the&lt;br /&gt;
&amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object. By default, the server of the Worldwide Protein Data Bank (\url{ftp://ftp.wwpdb.org/pub/pdb/data/structures/divided/pdb/})&lt;br /&gt;
is used. See the API documentation for more details. Thanks again&lt;br /&gt;
to Kristian Rother for donating this module.&lt;br /&gt;
&lt;br /&gt;
==== How do I download the entire PDB? ====&lt;br /&gt;
&lt;br /&gt;
The following commands will store all PDB files in the &amp;lt;code&amp;gt;/data/pdb&amp;lt;/code&amp;gt;&lt;br /&gt;
directory: &lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
python PDBList.py all /data/pdb&lt;br /&gt;
python PDBList.py all /data/pdb -d&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
The API method for this is called &amp;lt;code&amp;gt;download_entire_pdb&amp;lt;/code&amp;gt;.&lt;br /&gt;
Adding the &amp;lt;code&amp;gt;-d&amp;lt;/code&amp;gt; option will store all files in the same directory.&lt;br /&gt;
Otherwise, they are sorted into PDB-style subdirectories according&lt;br /&gt;
to their PDB ID's. Depending on the traffic, a complete download will&lt;br /&gt;
take 2-4 days.&lt;br /&gt;
&lt;br /&gt;
==== How do I keep a local copy of the PDB up-to-date? ====&lt;br /&gt;
&lt;br /&gt;
This can also be done using the &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object. One simply&lt;br /&gt;
creates a &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object (specifying the directory where&lt;br /&gt;
the local copy of the PDB is present) and calls the &amp;lt;code&amp;gt;update_pdb&amp;lt;/code&amp;gt;&lt;br /&gt;
method:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
pl = PDBList(pdb='/data/pdb')&lt;br /&gt;
pl.update_pdb()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
One can of course make a weekly cronjob out of this to keep&lt;br /&gt;
the local copy automatically up-to-date. The PDB ftp site can also&lt;br /&gt;
be specified (see API documentation).&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; has some additional methods that can be of use. The&lt;br /&gt;
&amp;lt;code&amp;gt;get_all_obsolete&amp;lt;/code&amp;gt; method can be used to get a list of all&lt;br /&gt;
obsolete PDB entries. The &amp;lt;code&amp;gt;changed_this_week&amp;lt;/code&amp;gt; method can&lt;br /&gt;
be used to obtain the entries that were added, modified or obsoleted&lt;br /&gt;
during the current week. For more info on the possibilities of &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt;,&lt;br /&gt;
see the API documentation.&lt;br /&gt;
&lt;br /&gt;
==== What about all those buggy PDB files? ====&lt;br /&gt;
&lt;br /&gt;
It is well known that many PDB files contain semantic errors (I'm&lt;br /&gt;
not talking about the structures themselves know, but their representation&lt;br /&gt;
in PDB files). Bio.PDB tries to handle this in two ways. The PDBParser&lt;br /&gt;
object can behave in two ways: a restrictive way and a permissive&lt;br /&gt;
way (THIS IS NOW THE DEFAULT). The restrictive way used to be the&lt;br /&gt;
default, but people seemed to think that Bio.PDB 'crashed' due to&lt;br /&gt;
a bug (hah!), so I changed it. If you ever encounter a real bug, please&lt;br /&gt;
tell me immediately!&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Permissive parser&lt;br /&gt;
parser = PDBParser(PERMISSIVE=1)&lt;br /&gt;
parser = PDBParser()  # The same (default)&lt;br /&gt;
# Strict parser&lt;br /&gt;
strict_parser = PDBParser(PERMISSIVE=0)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
In the permissive state (DEFAULT), PDB files that obviously contain&lt;br /&gt;
errors are 'corrected' (i.e. some residues or atoms are left out).&lt;br /&gt;
These errors include:&lt;br /&gt;
* Multiple residues with the same identifier&lt;br /&gt;
* Multiple atoms with the same identifier (taking into account the altloc identifier)&lt;br /&gt;
These errors indicate real problems in the PDB file (for details see&lt;br /&gt;
the Bioinformatics article). In the restrictive state, PDB files with&lt;br /&gt;
errors cause an exception to occur. This is useful to find errors&lt;br /&gt;
in PDB files.&lt;br /&gt;
&lt;br /&gt;
Some errors however are automatically corrected. Normally each disordered&lt;br /&gt;
atom should have a non-blank altloc identifier. However, there are&lt;br /&gt;
many structures that do not follow this convention, and have a blank&lt;br /&gt;
and a non-blank identifier for two disordered positions of the same&lt;br /&gt;
atom. This is automatically interpreted in the right way.&lt;br /&gt;
&lt;br /&gt;
Sometimes a structure contains a list of residues belonging to chain&lt;br /&gt;
A, followed by residues belonging to chain B, and again followed by&lt;br /&gt;
residues belonging to chain A, i.e. the chains are 'broken'. This&lt;br /&gt;
is also correctly interpreted.&lt;br /&gt;
&lt;br /&gt;
==== Can I write PDB files? ====&lt;br /&gt;
&lt;br /&gt;
Use the PDBIO class for this. It's easy to write out specific parts&lt;br /&gt;
of a structure too, of course.&lt;br /&gt;
&lt;br /&gt;
Example: saving a structure&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
io = PDBIO()&lt;br /&gt;
io.set_structure(s)&lt;br /&gt;
io.save('out.pdb')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
If you want to write out a part of the structure, make use of the&lt;br /&gt;
&amp;lt;code&amp;gt;Select&amp;lt;/code&amp;gt; class (also in &amp;lt;code&amp;gt;PDBIO&amp;lt;/code&amp;gt;). Select has four methods:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
accept_model(model)&lt;br /&gt;
accept_chain(chain)&lt;br /&gt;
accept_residue(residue)&lt;br /&gt;
accept_atom(atom)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
By default, every method returns 1 (which means the model/chain/residue/atom&lt;br /&gt;
is included in the output). By subclassing &amp;lt;code&amp;gt;Select&amp;lt;/code&amp;gt; and returning&lt;br /&gt;
0 when appropriate you can exclude models, chains, etc. from the output.&lt;br /&gt;
Cumbersome maybe, but very powerful. The following code only writes&lt;br /&gt;
out glycine residues:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
class GlySelect(Select):&lt;br /&gt;
    def accept_residue(self, residue):&lt;br /&gt;
        if residue.get_name()=='GLY':&lt;br /&gt;
            return 1&lt;br /&gt;
        else:&lt;br /&gt;
            return 0&lt;br /&gt;
&lt;br /&gt;
io = PDBIO()&lt;br /&gt;
io.set_structure(s)&lt;br /&gt;
io.save('gly_only.pdb', GlySelect())&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
If this is all too complicated for you, the &amp;lt;code&amp;gt;Dice&amp;lt;/code&amp;gt; module contains&lt;br /&gt;
a handy &amp;lt;code&amp;gt;extract&amp;lt;/code&amp;gt; function that writes out all residues in&lt;br /&gt;
a chain between a start and end residue.&lt;br /&gt;
&lt;br /&gt;
==== Can I write mmCIF files? ====&lt;br /&gt;
&lt;br /&gt;
No, and I also don't have plans to add that functionality soon (or&lt;br /&gt;
ever - I don't need it at all, and it's a lot of work, plus no-one&lt;br /&gt;
has ever asked for it). People who want to add this can contact me.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== The Structure object ===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==== What's the overall layout of a Structure object? ====&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object follows the so-called '''SMCRA'''&lt;br /&gt;
(Structure/Model/Chain/Residue/Atom) architecture : &lt;br /&gt;
&lt;br /&gt;
\begin{itemize&lt;br /&gt;
\item A structure consists of models&lt;br /&gt;
\item A model consists of chains&lt;br /&gt;
\item A chain consists of residues&lt;br /&gt;
\item A residue consists of atoms&lt;br /&gt;
\end{itemize&lt;br /&gt;
This is the way many structural biologists/bioinformaticians think&lt;br /&gt;
about structure, and provides a simple but efficient way to deal with&lt;br /&gt;
structure. Additional stuff is essentially added when needed. A UML&lt;br /&gt;
diagram of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object (forget about the &amp;lt;code&amp;gt;Disordered&amp;lt;/code&amp;gt;&lt;br /&gt;
classes for now) is shown in Fig. \ref{cap:SMCRA.&lt;br /&gt;
&lt;br /&gt;
%&lt;br /&gt;
\begin{figure[tbh]&lt;br /&gt;
\begin{center\includegraphics[%&lt;br /&gt;
  width=100mm,&lt;br /&gt;
  keepaspectratio]{images/smcra.png\end{center&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
\caption{\label{cap:SMCRAUML diagram of SMCRA architecture of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt;&lt;br /&gt;
object. Full lines with diamonds denote aggregation, full lines with&lt;br /&gt;
arrows denote referencing, full lines with triangles denote inheritance&lt;br /&gt;
and dashed lines with triangles denote interface realization. &lt;br /&gt;
\end{figure&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==== How do I navigate through a Structure object? ====&lt;br /&gt;
&lt;br /&gt;
The following code iterates through all atoms of a structure:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
p = PDBParser()&lt;br /&gt;
structure = p.get_structure('X', 'pdb1fat.ent')&lt;br /&gt;
for model in structure:&lt;br /&gt;
    for chain in model:&lt;br /&gt;
        for residue in chain:&lt;br /&gt;
            for atom in residue:&lt;br /&gt;
                print atom&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
There are also some shortcuts:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Iterate over all atoms in a structure&lt;br /&gt;
for atom in structure.get_atoms():&lt;br /&gt;
    print atom&lt;br /&gt;
&lt;br /&gt;
# Iterate over all residues in a model&lt;br /&gt;
for residue in model.get_residues():&lt;br /&gt;
    print residue&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Structures, models, chains, residues and atoms are called '''Entities'''&lt;br /&gt;
in Biopython. You can always get a parent &amp;lt;code&amp;gt;Entity&amp;lt;/code&amp;gt; from a child&lt;br /&gt;
&amp;lt;code&amp;gt;Entity&amp;lt;/code&amp;gt;, e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
residue = atom.get_parent()&lt;br /&gt;
chain = residue.get_parent()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
You can also test whether an &amp;lt;code&amp;gt;Entity&amp;lt;/code&amp;gt; has a certain child using&lt;br /&gt;
the &amp;lt;code&amp;gt;has_id&amp;lt;/code&amp;gt; method.&lt;br /&gt;
&lt;br /&gt;
==== Can I do that a bit more conveniently? ====&lt;br /&gt;
&lt;br /&gt;
You can do things like:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
atoms = structure.get_atoms()&lt;br /&gt;
residues = structure.get_residues()&lt;br /&gt;
atoms = chain.get_atoms()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
You can also use the &amp;lt;code&amp;gt;Selection.unfold_entities&amp;lt;/code&amp;gt; function:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Get all residues from a structure&lt;br /&gt;
res_list = Selection.unfold_entities(structure, 'R')&lt;br /&gt;
# Get all atoms from a chain&lt;br /&gt;
atom_list = Selection.unfold_entities(chain, 'A')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Obviously, &amp;lt;code&amp;gt;A&amp;lt;/code&amp;gt;=atom, &amp;lt;code&amp;gt;R&amp;lt;/code&amp;gt;=residue, &amp;lt;code&amp;gt;C&amp;lt;/code&amp;gt;=chain, &amp;lt;code&amp;gt;M&amp;lt;/code&amp;gt;=model, &amp;lt;code&amp;gt;S&amp;lt;/code&amp;gt;=structure.&lt;br /&gt;
You can use this to go up in the hierarchy, e.g. to get a list of (unique) &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;Chain&amp;lt;/code&amp;gt; parents from a list of&lt;br /&gt;
&amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;s:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
residue_list = Selection.unfold_entities(atom_list, 'R')&lt;br /&gt;
chain_list = Selection.unfold_entities(atom_list, 'C')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
For more info, see the API documentation.&lt;br /&gt;
&lt;br /&gt;
==== How do I extract a specific &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;Chain&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;Model&amp;lt;/code&amp;gt; from a &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt;? ====&lt;br /&gt;
&lt;br /&gt;
Easy. Here are some examples:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
model = structure[0]&lt;br /&gt;
chain = model['A']&lt;br /&gt;
residue = chain[100]&lt;br /&gt;
atom = residue['CA']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Note that you can use a shortcut:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
atom = structure[0]['A'][100]['CA']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== What is a model id? ====&lt;br /&gt;
&lt;br /&gt;
The model id is an integer which denotes the rank of the model in&lt;br /&gt;
the PDB/mmCIF file. The model is starts at 0. Crystal structures generally&lt;br /&gt;
have only one model (with id 0), while NMR files usually have several&lt;br /&gt;
models.&lt;br /&gt;
&lt;br /&gt;
==== What is a chain id? ====&lt;br /&gt;
&lt;br /&gt;
The chain id is specified in the PDB/mmCIF file, and is a single character&lt;br /&gt;
(typically a letter). &lt;br /&gt;
&lt;br /&gt;
==== What is a residue id? ====&lt;br /&gt;
&lt;br /&gt;
This is a bit more complicated, due to the clumsy PDB format. A residue&lt;br /&gt;
id is a tuple with three elements:&lt;br /&gt;
* The '''hetero-flag''': this is &amp;lt;code&amp;gt;'H_'&amp;lt;/code&amp;gt; plus the name of the hetero-residue (e.g. &amp;lt;code&amp;gt;'H_GLC'&amp;lt;/code&amp;gt; in the case of a glucose molecule), or &amp;lt;code&amp;gt;'W'&amp;lt;/code&amp;gt; in the case of a water molecule.&lt;br /&gt;
* The '''sequence identifier''' in the chain, e.g. 100&lt;br /&gt;
* The '''insertion code''', e.g. &amp;lt;code&amp;gt;'A'&amp;lt;/code&amp;gt;. The insertion code is sometimes used to preserve a certain desirable residue numbering scheme. A Ser 80 insertion mutant (inserted e.g. between a Thr 80 and an Asn 81 residue) could e.g. have sequence identifiers and insertion codes as follows: Thr 80 A, Ser 80 B, Asn 81. In this way the residue numbering scheme stays in tune with that of the wild type structure.&lt;br /&gt;
&lt;br /&gt;
The id of the above glucose residue would thus be &amp;lt;code&amp;gt;('H_GLC', 100, 'A')&amp;lt;/code&amp;gt;. If the hetero-flag and insertion code are blank, the sequence&lt;br /&gt;
identifier alone can be used:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Full id&lt;br /&gt;
residue = chain[(' ', 100, ' ')]&lt;br /&gt;
# Shortcut id&lt;br /&gt;
residue = chain[100]&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
The reason for the hetero-flag is that many, many PDB files use the same sequence identifier for an amino acid and a hetero-residue or a water, which would create obvious problems if the hetero-flag was not used.&lt;br /&gt;
&lt;br /&gt;
==== What is an atom id? ====&lt;br /&gt;
&lt;br /&gt;
The atom id is simply the atom name (eg. &amp;lt;code&amp;gt;'CA'&amp;lt;/code&amp;gt;). In practice,&lt;br /&gt;
the atom name is created by stripping all spaces from the atom name&lt;br /&gt;
in the PDB file. &lt;br /&gt;
&lt;br /&gt;
However, in PDB files, a space can be part of an atom name. Often,&lt;br /&gt;
calcium atoms are called &amp;lt;code&amp;gt;'CA..'&amp;lt;/code&amp;gt; in order to distinguish them&lt;br /&gt;
from C&amp;amp;alpha; atoms (which are called &amp;lt;code&amp;gt;'.CA.'&amp;lt;/code&amp;gt;). In cases&lt;br /&gt;
were stripping the spaces would create problems (ie. two atoms called&lt;br /&gt;
&amp;lt;code&amp;gt;'CA'&amp;lt;/code&amp;gt; in the same residue) the spaces are kept.&lt;br /&gt;
&lt;br /&gt;
==== How is disorder handled? ====&lt;br /&gt;
&lt;br /&gt;
This is one of the strong points of Bio.PDB. It can handle both disordered&lt;br /&gt;
atoms and point mutations (ie. a Gly and an Ala residue in the same&lt;br /&gt;
position). &lt;br /&gt;
&lt;br /&gt;
Disorder should be dealt with from two points of view: the atom and&lt;br /&gt;
the residue points of view. In general, I have tried to encapsulate&lt;br /&gt;
all the complexity that arises from disorder. If you just want to&lt;br /&gt;
loop over all C&amp;amp;alpha; atoms, you do not care that some residues&lt;br /&gt;
have a disordered side chain. On the other hand it should also be&lt;br /&gt;
possible to represent disorder completely in the data structure. Therefore,&lt;br /&gt;
disordered atoms or residues are stored in special objects that behave&lt;br /&gt;
as if there is no disorder. This is done by only representing a subset&lt;br /&gt;
of the disordered atoms or residues. Which subset is picked (e.g.&lt;br /&gt;
which of the two disordered OG side chain atom positions of a Ser&lt;br /&gt;
residue is used) can be specified by the user.&lt;br /&gt;
&lt;br /&gt;
'''Disordered atom positions''' are represented by ordinary &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;&lt;br /&gt;
objects, but all &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects that represent the same physical&lt;br /&gt;
atom are stored in a &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; object (see Fig. \ref{cap:SMCRA).&lt;br /&gt;
Each &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object in a &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; object can&lt;br /&gt;
be uniquely indexed using its altloc specifier. The &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt;&lt;br /&gt;
object forwards all uncaught method calls to the selected Atom object,&lt;br /&gt;
by default the one that represents the atom with the highest&lt;br /&gt;
occupancy. The user can of course change the selected &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;&lt;br /&gt;
object, making use of its altloc specifier. In this way atom disorder&lt;br /&gt;
is represented correctly without much additional complexity. In other&lt;br /&gt;
words, if you are not interested in atom disorder, you will not be&lt;br /&gt;
bothered by it.&lt;br /&gt;
&lt;br /&gt;
Each disordered atom has a characteristic altloc identifier. You can&lt;br /&gt;
specify that a &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; object should behave like&lt;br /&gt;
the &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object associated with a specific altloc identifier:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
atom.disordered_select('A')  # select altloc A atom&lt;br /&gt;
atom.disordered_select('B')  # select altloc B atom &lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
A special case arises when disorder is due to '''point mutations''',&lt;br /&gt;
i.e. when two or more point mutants of a polypeptide are present in&lt;br /&gt;
the crystal. An example of this can be found in PDB structure 1EN2.&lt;br /&gt;
&lt;br /&gt;
Since these residues belong to a different residue type (e.g. let's&lt;br /&gt;
say Ser 60 and Cys 60) they should not be stored in a single &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt;&lt;br /&gt;
object as in the common case. In this case, each residue is represented&lt;br /&gt;
by one &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object, and both &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; objects&lt;br /&gt;
are stored in a single &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt; object (see Fig.&lt;br /&gt;
\ref{cap:SMCRA).&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt; object forwards all uncaught&lt;br /&gt;
methods to the selected &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object (by default the last&lt;br /&gt;
&amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object added), and thus behaves like an ordinary&lt;br /&gt;
residue. Each &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object in a &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt;&lt;br /&gt;
object can be uniquely identified by its residue name. In the above&lt;br /&gt;
example, residue Ser 60 would have id 'SER' in the &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt;&lt;br /&gt;
object, while residue Cys 60 would have id 'CYS'. The user can select&lt;br /&gt;
the active &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object in a &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt;&lt;br /&gt;
object via this id.&lt;br /&gt;
&lt;br /&gt;
Example: suppose that a chain has a point mutation at position 10,&lt;br /&gt;
consisting of a Ser and a Cys residue. Make sure that residue 10 of&lt;br /&gt;
this chain behaves as the Cys residue.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
residue = chain[10]&lt;br /&gt;
residue.disordered_select('CYS')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
In addition, you can get a list of all &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects (ie.&lt;br /&gt;
all &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; objects are 'unpacked' to their individual&lt;br /&gt;
&amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects) using the &amp;lt;code&amp;gt;get_unpacked_list&amp;lt;/code&amp;gt; method&lt;br /&gt;
of a (&amp;lt;code&amp;gt;Disordered&amp;lt;/code&amp;gt;)&amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object.&lt;br /&gt;
&lt;br /&gt;
==== Can I sort residues in a chain somehow? ====&lt;br /&gt;
&lt;br /&gt;
Yes, kinda, but I'm waiting for a request for this feature to finish&lt;br /&gt;
it :-).&lt;br /&gt;
&lt;br /&gt;
==== How are ligands and solvent handled? ====&lt;br /&gt;
&lt;br /&gt;
See 'What is a residue id?'.&lt;br /&gt;
&lt;br /&gt;
==== What about B factors? ====&lt;br /&gt;
&lt;br /&gt;
Well, yes! Bio.PDB supports isotropic and anisotropic B factors, and&lt;br /&gt;
also deals with standard deviations of anisotropic B factor if present&lt;br /&gt;
(see the section [[#Analysis]]).&lt;br /&gt;
&lt;br /&gt;
==== What about standard deviation of atomic positions? ====&lt;br /&gt;
&lt;br /&gt;
Yup, supported. See the section [[#Analysis]].&lt;br /&gt;
&lt;br /&gt;
==== I think the SMCRA data structure is not flexible/sexy/whatever enough... ====&lt;br /&gt;
&lt;br /&gt;
Sure, sure. Everybody is always coming up with (mostly vaporware or&lt;br /&gt;
partly implemented) data structures that handle all possible situations&lt;br /&gt;
and are extensible in all thinkable (and unthinkable) ways. The prosaic&lt;br /&gt;
truth however is that 99.9\% of people using (and I mean really using!)&lt;br /&gt;
crystal structures think in terms of models, chains, residues and&lt;br /&gt;
atoms. The philosophy of Bio.PDB is to provide a reasonably fast,&lt;br /&gt;
clean, simple, but complete data structure to access structure data.&lt;br /&gt;
The proof of the pudding is in the eating.&lt;br /&gt;
&lt;br /&gt;
Moreover, it is quite easy to build more specialised data structures&lt;br /&gt;
on top of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; class (eg. there's a &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt;&lt;br /&gt;
class). On the other hand, the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object is built&lt;br /&gt;
using a Parser/Consumer approach (called &amp;lt;code&amp;gt;PDBParser&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;MMCIFParser&amp;lt;/code&amp;gt;&lt;br /&gt;
and &amp;lt;code&amp;gt;StructureBuilder&amp;lt;/code&amp;gt;, respectively). One can easily reuse&lt;br /&gt;
the PDB/mmCIF parsers by implementing a specialised &amp;lt;code&amp;gt;StructureBuilder&amp;lt;/code&amp;gt;&lt;br /&gt;
class. It is of course also trivial to add support for new file formats&lt;br /&gt;
by writing new parsers.&lt;br /&gt;
&lt;br /&gt;
=== Analysis ===&lt;br /&gt;
&lt;br /&gt;
==== How do I extract information from an &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object? ====&lt;br /&gt;
&lt;br /&gt;
Using the following methods:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
a.get_name()           # atom name (spaces stripped, e.g. 'CA')&lt;br /&gt;
a.get_id()             # id (equals atom name)&lt;br /&gt;
a.get_coord()          # atomic coordinates&lt;br /&gt;
a.get_vector()         # atomic coordinates as Vector object&lt;br /&gt;
a.get_bfactor()        # isotropic B factor&lt;br /&gt;
a.get_occupancy()      # occupancy&lt;br /&gt;
a.get_altloc()         # alternative location specifier&lt;br /&gt;
a.get_sigatm()         # std. dev. of atomic parameters&lt;br /&gt;
a.get_siguij()         # std. dev. of anisotropic B factor&lt;br /&gt;
a.get_anisou()         # anisotropic B factor&lt;br /&gt;
a.get_fullname()       # atom name (with spaces, e.g. '.CA.')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I extract information from a &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object? ====&lt;br /&gt;
&lt;br /&gt;
Using the following methods:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
r.get_resname()         # return the residue name (eg. 'GLY')&lt;br /&gt;
r.is_disordered()       # 1 if the residue has disordered atoms&lt;br /&gt;
r.get_segid()           # return the SEGID&lt;br /&gt;
r.has_id(name)          # test if a residue has a certain atom&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I measure distances? ====&lt;br /&gt;
&lt;br /&gt;
That's simple: the minus operator for atoms has been overloaded to&lt;br /&gt;
return the distance between two atoms. &lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Get some atoms&lt;br /&gt;
ca1 = residue1['CA']&lt;br /&gt;
ca2 = residue2['CA']&lt;br /&gt;
# Simply subtract the atoms to get their distance&lt;br /&gt;
distance = ca1-ca2&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I measure angles? ====&lt;br /&gt;
&lt;br /&gt;
This can easily be done via the vector representation of the atomic coordinates, and the &amp;lt;code&amp;gt;calc_angle&amp;lt;/code&amp;gt; function from the &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; module:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
vector1 = atom1.get_vector()&lt;br /&gt;
vector2 = atom2.get_vector()&lt;br /&gt;
vector3 = atom3.get_vector()&lt;br /&gt;
angle = calc_angle(vector1, vector2, vector3)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I measure torsion angles? ====&lt;br /&gt;
&lt;br /&gt;
Again, this can easily be done via the vector representation of the&lt;br /&gt;
atomic coordinates, this time using the &amp;lt;code&amp;gt;calc_dihedral&amp;lt;/code&amp;gt; function&lt;br /&gt;
from the &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; module:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
vector1 = atom1.get_vector()&lt;br /&gt;
vector2 = atom2.get_vector()&lt;br /&gt;
vector3 = atom3.get_vector()&lt;br /&gt;
vector4 = atom4.get_vector()&lt;br /&gt;
angle = calc_dihedral(vector1, vector2, vector3, vector4)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I determine atom-atom contacts? ====&lt;br /&gt;
&lt;br /&gt;
Use &amp;lt;code&amp;gt;NeighborSearch&amp;lt;/code&amp;gt;. This uses a KD tree data structure coded&lt;br /&gt;
in C behind the screens, so it's pretty darn fast (see &amp;lt;code&amp;gt;Bio.KDTree&amp;lt;/code&amp;gt;).&lt;br /&gt;
&lt;br /&gt;
==== How do I extract polypeptides from a &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object? ====&lt;br /&gt;
&lt;br /&gt;
Use &amp;lt;code&amp;gt;PolypeptideBuilder&amp;lt;/code&amp;gt;. You can use the resulting &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt;&lt;br /&gt;
object to get the sequence as a &amp;lt;code&amp;gt;Seq&amp;lt;/code&amp;gt; object or to get a list&lt;br /&gt;
of C&amp;amp;alpha; atoms as well. Polypeptides can be built using a C-N&lt;br /&gt;
or a C&amp;amp;alpha;-C&amp;amp;alpha; distance criterion.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Using C-N &lt;br /&gt;
ppb=PPBuilder()&lt;br /&gt;
for pp in ppb.build_peptides(structure): &lt;br /&gt;
    print pp.get_sequence()&lt;br /&gt;
&lt;br /&gt;
# Using CA-CA&lt;br /&gt;
ppb=CaPPBuilder()&lt;br /&gt;
for pp in ppb.build_peptides(structure): &lt;br /&gt;
    print pp.get_sequence()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that in the above case only model 0 of the structure is considered&lt;br /&gt;
by &amp;lt;code&amp;gt;PolypeptideBuilder&amp;lt;/code&amp;gt;. However, it is possible to use &amp;lt;code&amp;gt;PolypeptideBuilder&amp;lt;/code&amp;gt;&lt;br /&gt;
to build &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt; objects from &amp;lt;code&amp;gt;Model&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;Chain&amp;lt;/code&amp;gt;&lt;br /&gt;
objects as well.&lt;br /&gt;
&lt;br /&gt;
==== How do I get the sequence of a structure? ====&lt;br /&gt;
&lt;br /&gt;
The first thing to do is to extract all polypeptides from the structure&lt;br /&gt;
(see previous entry). The sequence of each polypeptide can then easily&lt;br /&gt;
be obtained from the &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt; objects. The sequence is&lt;br /&gt;
represented as a Biopython &amp;lt;code&amp;gt;Seq&amp;lt;/code&amp;gt; object, and its alphabet is&lt;br /&gt;
defined by a &amp;lt;code&amp;gt;ProteinAlphabet&amp;lt;/code&amp;gt; object.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; seq = polypeptide.get_sequence()&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print seq&lt;br /&gt;
Seq('SNVVE...', &amp;lt;class Bio.Alphabet.ProteinAlphabet&amp;gt;)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I determine secondary structure? ====&lt;br /&gt;
&lt;br /&gt;
For this functionality, you need to install DSSP (and obtain a license&lt;br /&gt;
for it - free for academic use, see http://www.cmbi.kun.nl/gv/dssp/).&lt;br /&gt;
Then use the &amp;lt;code&amp;gt;DSSP&amp;lt;/code&amp;gt; class, which maps &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; objects&lt;br /&gt;
to their secondary structure (and accessible surface area). The DSSP&lt;br /&gt;
codes are listed in Table \ref{cap:DSSP-codes. Note that DSSP (the&lt;br /&gt;
program, and thus by consequence the class) cannot handle multiple&lt;br /&gt;
models!&lt;br /&gt;
&lt;br /&gt;
%&lt;br /&gt;
\begin{table&lt;br /&gt;
&lt;br /&gt;
==== \begin{tabular{|c|c|&lt;br /&gt;
\hline &lt;br /&gt;
Code&amp;amp;&lt;br /&gt;
Secondary structure\tabularnewline&lt;br /&gt;
\hline&lt;br /&gt;
\hline &lt;br /&gt;
H&amp;amp;&lt;br /&gt;
&amp;amp;alpha;-helix\tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
B&amp;amp;&lt;br /&gt;
Isolated &amp;amp;beta;-bridge residue\tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
E&amp;amp;&lt;br /&gt;
Strand \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
G&amp;amp;&lt;br /&gt;
3-10 helix \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
I&amp;amp;&lt;br /&gt;
$\Pi$-helix \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
T&amp;amp;&lt;br /&gt;
Turn\tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
S&amp;amp;&lt;br /&gt;
Bend \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
-&amp;amp;&lt;br /&gt;
Other\tabularnewline&lt;br /&gt;
\hline&lt;br /&gt;
\end{tabular&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
\caption{\label{cap:DSSP-codesDSSP codes in Bio.PDB.&lt;br /&gt;
\end{table&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==== How do I calculate the accessible surface area of a residue? ====&lt;br /&gt;
&lt;br /&gt;
Use the &amp;lt;code&amp;gt;DSSP&amp;lt;/code&amp;gt; class (see also previous entry). But see also&lt;br /&gt;
next entry.&lt;br /&gt;
&lt;br /&gt;
==== How do I calculate residue depth? ====&lt;br /&gt;
&lt;br /&gt;
Residue depth is the average distance of a residue's atoms from the&lt;br /&gt;
solvent accessible surface. It's a fairly new and very powerful parameterization&lt;br /&gt;
of solvent accessibility. For this functionality, you need to install&lt;br /&gt;
Michel Sanner's MSMS program (http://www.scripps.edu/pub/olson-web/people/sanner/html/msms_home.html).&lt;br /&gt;
Then use the &amp;lt;code&amp;gt;ResidueDepth&amp;lt;/code&amp;gt; class. This class behaves as a&lt;br /&gt;
dictionary which maps &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; objects to corresponding (residue&lt;br /&gt;
depth, C&amp;amp;alpha; depth) tuples. The C&amp;amp;alpha; depth is the distance&lt;br /&gt;
of a residue's C&amp;amp;alpha; atom to the solvent accessible surface. &lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
model = structure[0]&lt;br /&gt;
rd = ResidueDepth(model, pdb_file)&lt;br /&gt;
residue_depth, ca_depth = rd[some_residue]&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
You can also get access to the molecular surface itself (via the &amp;lt;code&amp;gt;get_surface&amp;lt;/code&amp;gt;&lt;br /&gt;
function), in the form of a Numeric python array with the surface points.&lt;br /&gt;
&lt;br /&gt;
==== How do I calculate Half Sphere Exposure? ====&lt;br /&gt;
&lt;br /&gt;
Half Sphere Exposure (HSE) is a new, 2D measure of solvent exposure.&lt;br /&gt;
Basically, it counts the number of C&amp;amp;alpha; atoms around a residue&lt;br /&gt;
in the direction of its side chain, and in the opposite direction&lt;br /&gt;
(within a radius of 13 Å). Despite its simplicity, it outperforms&lt;br /&gt;
many other measures of solvent exposure. An article describing this&lt;br /&gt;
novel 2D measure has been submitted.&lt;br /&gt;
&lt;br /&gt;
HSE comes in two flavors: HSE&amp;amp;alpha; and HSE&amp;amp;beta;. The former&lt;br /&gt;
only uses the C&amp;amp;alpha; atom positions, while the latter uses the&lt;br /&gt;
C&amp;amp;alpha; and C&amp;amp;beta; atom positions. The HSE measure is calculated&lt;br /&gt;
by the &amp;lt;code&amp;gt;HSExposure&amp;lt;/code&amp;gt; class, which can also calculate the contact&lt;br /&gt;
number. The latter class has methods which return dictionaries that&lt;br /&gt;
map a &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object to its corresponding HSE&amp;amp;alpha;, HSE&amp;amp;beta;&lt;br /&gt;
and contact number values.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
model = structure[0]&lt;br /&gt;
hse = HSExposure()&lt;br /&gt;
# Calculate HSEalpha&lt;br /&gt;
exp_ca = hse.calc_hs_exposure(model, option='CA3')&lt;br /&gt;
# Calculate HSEbeta&lt;br /&gt;
exp_cb = hse.calc_hs_exposure(model, option='CB')&lt;br /&gt;
# Calculate classical coordination number&lt;br /&gt;
exp_fs = hse.calc_fs_exposure(model)&lt;br /&gt;
# Print HSEalpha for a residue&lt;br /&gt;
print exp_ca[some_residue]&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I map the residues of two related structures onto each other? ====&lt;br /&gt;
&lt;br /&gt;
First, create an alignment file in FASTA format, then use the &amp;lt;code&amp;gt;StructureAlignment&amp;lt;/code&amp;gt;&lt;br /&gt;
class. This class can also be used for alignments with more than two&lt;br /&gt;
structures.&lt;br /&gt;
&lt;br /&gt;
==== How do I test if a Residue object is an amino acid? ====&lt;br /&gt;
&lt;br /&gt;
Use &amp;lt;code&amp;gt;is_aa(residue)&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
==== Can I do vector operations on atomic coordinates? ====&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects return a &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; object representation&lt;br /&gt;
of the coordinates with the &amp;lt;code&amp;gt;get_vector&amp;lt;/code&amp;gt; method. &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt;&lt;br /&gt;
implements the full set of 3D vector operations, matrix multiplication&lt;br /&gt;
(left and right) and some advanced rotation-related operations as&lt;br /&gt;
well. See also next question.&lt;br /&gt;
&lt;br /&gt;
==== How do I put a virtual C&amp;amp;beta; on a Gly residue? ====&lt;br /&gt;
&lt;br /&gt;
OK, I admit, this example is only present to show off the possibilities&lt;br /&gt;
of Bio.PDB's &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; module (though this code is actually&lt;br /&gt;
used in the &amp;lt;code&amp;gt;HSExposure&amp;lt;/code&amp;gt; module, which contains a novel way&lt;br /&gt;
to parametrize residue exposure - publication underway). Suppose that&lt;br /&gt;
you would like to find the position of a Gly residue's C&amp;amp;beta; atom,&lt;br /&gt;
if it had one. How would you do that? Well, rotating the N atom of&lt;br /&gt;
the Gly residue along the C&amp;amp;alpha;-C bond over -120 degrees roughly&lt;br /&gt;
puts it in the position of a virtual C&amp;amp;beta; atom. Here's how to&lt;br /&gt;
do it, making use of the &amp;lt;code&amp;gt;rotaxis&amp;lt;/code&amp;gt; method (which can be used&lt;br /&gt;
to construct a rotation around a certain axis) of the &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt;&lt;br /&gt;
module:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# get atom coordinates as vectors&lt;br /&gt;
n = residue['N'].get_vector() &lt;br /&gt;
c = residue['C'].get_vector() &lt;br /&gt;
ca = residue['CA'].get_vector()&lt;br /&gt;
# center at origin&lt;br /&gt;
n = n - ca&lt;br /&gt;
c = c - ca&lt;br /&gt;
# find rotation matrix that rotates n -120 degrees along the ca-c vector&lt;br /&gt;
rot = rotaxis(-pi{*120.0/180.0, c)&lt;br /&gt;
# apply rotation to ca-n vector&lt;br /&gt;
cb_at_origin = n.left_multiply(rot)&lt;br /&gt;
# put on top of ca atom&lt;br /&gt;
cb = cb_at_origin + ca&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
This example shows that it's possible to do some quite nontrivial&lt;br /&gt;
vector operations on atomic data, which can be quite useful. In addition&lt;br /&gt;
to all the usual vector operations (cross (use &amp;lt;code&amp;gt;**&amp;lt;/code&amp;gt;), and&lt;br /&gt;
dot (use &amp;lt;code&amp;gt;*&amp;lt;/code&amp;gt;) product, angle, norm, etc.) and the above mentioned&lt;br /&gt;
&amp;lt;code&amp;gt;rotaxis function, the &amp;lt;code&amp;gt;Vector module also has methods&lt;br /&gt;
to rotate (&amp;lt;code&amp;gt;rotmat&amp;lt;/code&amp;gt;) or reflect (&amp;lt;code&amp;gt;refmat&amp;lt;/code&amp;gt;) one vector&lt;br /&gt;
on top of another.&lt;br /&gt;
&lt;br /&gt;
=== Manipulating the structure ===&lt;br /&gt;
&lt;br /&gt;
==== How do I superimpose two structures? ====&lt;br /&gt;
&lt;br /&gt;
Surprisingly, this is done using the &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; object.&lt;br /&gt;
This object calculates the rotation and translation matrix that rotates&lt;br /&gt;
two lists of atoms on top of each other in such a way that their RMSD&lt;br /&gt;
is minimized. Of course, the two lists need to contain the same amount&lt;br /&gt;
of atoms. The &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; object can also apply the rotation/translation&lt;br /&gt;
to a list of atoms. The rotation and translation are stored as a tuple&lt;br /&gt;
in the &amp;lt;code&amp;gt;rotran&amp;lt;/code&amp;gt; attribute of the &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; object&lt;br /&gt;
(note that the rotation is right multiplying!). The RMSD is stored&lt;br /&gt;
in the &amp;lt;code&amp;gt;rmsd&amp;lt;/code&amp;gt; attribute.&lt;br /&gt;
&lt;br /&gt;
The algorithm used by &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; comes from \textit{Matrix&lt;br /&gt;
computations, 2nd ed. Golub, G. \&amp;amp; Van Loan (1989) and makes use&lt;br /&gt;
of singular value decomposition (this is implemented in the general&lt;br /&gt;
&amp;lt;code&amp;gt;Bio.SVDSuperimposer&amp;lt;/code&amp;gt; module).&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
sup = Superimposer()&lt;br /&gt;
# Specify the atom lists&lt;br /&gt;
# 'fixed' and 'moving' are lists of Atom objects&lt;br /&gt;
# The moving atoms will be put on the fixed atoms&lt;br /&gt;
sup.set_atoms(fixed, moving)&lt;br /&gt;
# Print rotation/translation/rmsd&lt;br /&gt;
print sup.rotran&lt;br /&gt;
print sup.rms &lt;br /&gt;
# Apply rotation/translation to the moving atoms&lt;br /&gt;
sup.apply(moving)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I superimpose two structures based on their active sites? ====&lt;br /&gt;
&lt;br /&gt;
Pretty easily. Use the active site atoms to calculate the rotation/translation&lt;br /&gt;
matrices (see above), and apply these to the whole molecule.&lt;br /&gt;
&lt;br /&gt;
==== Can I manipulate the atomic coordinates? ====&lt;br /&gt;
&lt;br /&gt;
Yes, using the &amp;lt;code&amp;gt;transform&amp;lt;/code&amp;gt; method of the &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object,&lt;br /&gt;
or directly using the &amp;lt;code&amp;gt;set_coord&amp;lt;/code&amp;gt; method.&lt;br /&gt;
&lt;br /&gt;
== Other Structural Bioinformatics modules ==&lt;br /&gt;
&lt;br /&gt;
==== Bio.SCOP ====&lt;br /&gt;
&lt;br /&gt;
See the main Biopython tutorial.&lt;br /&gt;
&lt;br /&gt;
==== Bio.FSSP ====&lt;br /&gt;
&lt;br /&gt;
No documentation available yet.&lt;br /&gt;
&lt;br /&gt;
== You haven't answered my question yet! ==&lt;br /&gt;
&lt;br /&gt;
Woah! It's late and I'm tired, and a glass of excellent ''Pedro Ximenez'' sherry is waiting for me. Just drop me a mail, and I'll answer&lt;br /&gt;
you in the morning (with a bit of luck...).&lt;br /&gt;
&lt;br /&gt;
== Contributors ==&lt;br /&gt;
&lt;br /&gt;
The main author/maintainer of Bio.PDB is:&lt;br /&gt;
&lt;br /&gt;
 Thomas Hamelryck&lt;br /&gt;
 Bioinformatics center&lt;br /&gt;
 Institute of Molecular Biology&lt;br /&gt;
 University of Copenhagen&lt;br /&gt;
 Universitetsparken 15, Bygning 10&lt;br /&gt;
 DK-2100 København Ø&lt;br /&gt;
 Denmark&lt;br /&gt;
 thamelry@binf.ku.dk&lt;br /&gt;
&lt;br /&gt;
Kristian Rother donated code to interact with the PDB database, and to parse the PDB&lt;br /&gt;
header. Indraneel Majumdar sent in some bug reports and assisted in&lt;br /&gt;
coding the &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt; module. Many thanks to Brad Chapman,&lt;br /&gt;
Jeffrey Chang, Andrew Dalke and Iddo Friedberg for suggestions, comments,&lt;br /&gt;
help and/or biting criticism :-).&lt;br /&gt;
&lt;br /&gt;
== Can I contribute? ==&lt;br /&gt;
&lt;br /&gt;
Yes, yes, yes! Just send an e-mail to [mailto:thamelry@binf.ku.dk Thomas Hamelryck] or to [mailto:biopython-dev@biopython.org the Biopython developers] if you have something useful to contribute! Eternal fame awaits!&lt;/div&gt;</summary>
		<author><name>Mdehoon</name></author>	</entry>

	<entry>
		<id>http://www.biopython.org/wiki/The_Biopython_Structural_Bioinformatics_FAQ</id>
		<title>The Biopython Structural Bioinformatics FAQ</title>
		<link rel="alternate" type="text/html" href="http://www.biopython.org/wiki/The_Biopython_Structural_Bioinformatics_FAQ"/>
				<updated>2013-03-26T03:38:04Z</updated>
		
		<summary type="html">&lt;p&gt;Mdehoon: /* You haven't answered my question yet! */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Introduction ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB is a biopython module that focuses on working with crystal&lt;br /&gt;
structures of biological macromolecules. This document gives a fairly&lt;br /&gt;
complete overview of Bio.PDB.&lt;br /&gt;
&lt;br /&gt;
== Bio.PDB's installation ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB is automatically installed as part of Biopython. Biopython&lt;br /&gt;
can be obtained from http://www.biopython.org. It runs on many&lt;br /&gt;
platforms (Linux/Unix, windows, Mac,...).&lt;br /&gt;
&lt;br /&gt;
== Who's using Bio.PDB? ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB was used in the construction of DISEMBL, a web server that&lt;br /&gt;
predicts disordered regions in proteins (http://dis.embl.de/),&lt;br /&gt;
and COLUMBA, a website that provides annotated protein structures&lt;br /&gt;
(http://www.columba-db.de/). Bio.PDB has also been used to&lt;br /&gt;
perform a large scale search for active sites similarities between&lt;br /&gt;
protein structures in the PDB (see [http://dx.doi.org/10.1002/prot.10338 \textit{Proteins Struct. Func.&lt;br /&gt;
Gen., \textbf{2003, 51, 96-108]) , and to develop a new algorithm&lt;br /&gt;
that identifies linear secondary structure elements ([\emph{BMC Bioinformatics,&lt;br /&gt;
\textbf{2005, 6, 202 http://www.biomedcentral.com/1471-2105/6/202]).&lt;br /&gt;
&lt;br /&gt;
Judging from requests for features and information, Bio.PDB is also&lt;br /&gt;
used by several LPCs (Large Pharmaceutical Companies :-).&lt;br /&gt;
&lt;br /&gt;
== Is there a Bio.PDB reference? ==&lt;br /&gt;
&lt;br /&gt;
Yes, and I'd appreciate it if you would refer to Bio.PDB in publications&lt;br /&gt;
if you make use of it. The reference is:&lt;br /&gt;
&lt;br /&gt;
\begin{quote&lt;br /&gt;
Hamelryck, T., Manderick, B. (2003) PDB parser and structure class&lt;br /&gt;
implemented in Python. \textit{Bioinformatics, \textbf{19, 2308-2310. &lt;br /&gt;
\end{quote&lt;br /&gt;
The article can be freely downloaded via the Bioinformatics journal&lt;br /&gt;
website (http://www.binf.ku.dk/users/thamelry/references.html).&lt;br /&gt;
I welcome e-mails telling me what you are using Bio.PDB for. Feature&lt;br /&gt;
requests are welcome too.&lt;br /&gt;
&lt;br /&gt;
== How well tested is Bio.PDB? ==&lt;br /&gt;
&lt;br /&gt;
Pretty well, actually. Bio.PDB has been extensively tested on nearly&lt;br /&gt;
5500 structures from the PDB - all structures seemed to be parsed&lt;br /&gt;
correctly. More details can be found in the Bio.PDB Bioinformatics&lt;br /&gt;
article. Bio.PDB has been used/is being used in many research projects&lt;br /&gt;
as a reliable tool. In fact, I'm using Bio.PDB almost daily for research&lt;br /&gt;
purposes and continue working on improving it and adding new features.&lt;br /&gt;
&lt;br /&gt;
== How fast is it? ==&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;PDBParser&amp;lt;/code&amp;gt; performance was tested on about 800 structures&lt;br /&gt;
(each belonging to a unique SCOP superfamily). This takes about 20&lt;br /&gt;
minutes, or on average 1.5 seconds per structure. Parsing the structure&lt;br /&gt;
of the large ribosomal subunit (1FKK), which contains about 64000&lt;br /&gt;
atoms, takes 10 seconds on a 1000 MHz PC. In short: it's more than&lt;br /&gt;
fast enough for many applications.&lt;br /&gt;
&lt;br /&gt;
== Why should I use Bio.PDB? ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB might be exactly what you want, and then again it might not.&lt;br /&gt;
If you are interested in data mining the PDB header, you might want&lt;br /&gt;
to look elsewhere because there is only limited support for this.&lt;br /&gt;
If you look for a powerful, complete data structure to access the&lt;br /&gt;
atomic data Bio.PDB is probably for you. &lt;br /&gt;
&lt;br /&gt;
== Usage ==&lt;br /&gt;
&lt;br /&gt;
=== General questions ===&lt;br /&gt;
&lt;br /&gt;
==== Importing Bio.PDB ====&lt;br /&gt;
&lt;br /&gt;
That's simple:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
from Bio.PDB import *&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Is there support for molecular graphics? ====&lt;br /&gt;
&lt;br /&gt;
Not directly, mostly since there are quite a few Python based/Python&lt;br /&gt;
aware solutions already, that can potentially be used with Bio.PDB.&lt;br /&gt;
My choice is Pymol, BTW (I've used this successfully with Bio.PDB,&lt;br /&gt;
and there will probably be specific PyMol modules in Bio.PDB soon/some&lt;br /&gt;
day). Python based/aware molecular graphics solutions include:&lt;br /&gt;
&lt;br /&gt;
* PyMol: http://pymol.sourceforge.net/&lt;br /&gt;
* Chimera: http://www.cgl.ucsf.edu/chimera/&lt;br /&gt;
* PMV: http://www.scripps.edu/ sanner/python/&lt;br /&gt;
* Coot: http://www.ysbl.york.ac.uk/ emsley/coot/&lt;br /&gt;
* CCP4mg: http://www.ysbl.york.ac.uk/ lizp/molgraphics.html&lt;br /&gt;
* mmLib: http://pymmlib.sourceforge.net/ &lt;br /&gt;
* VMD: http://www.ks.uiuc.edu/Research/vmd/&lt;br /&gt;
* MMTK: http://starship.python.net/crew/hinsen/MMTK/&lt;br /&gt;
&lt;br /&gt;
I'd be crazy to write another molecular graphics application (been&lt;br /&gt;
there - done that, actually :-).&lt;br /&gt;
&lt;br /&gt;
=== Input/output ===&lt;br /&gt;
&lt;br /&gt;
==== How do I create a structure object from a PDB file? ====&lt;br /&gt;
&lt;br /&gt;
First, create a &amp;lt;code&amp;gt;PDBParser&amp;lt;/code&amp;gt; object:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
parser = PDBParser()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then, create a structure object from a PDB file in the following way&lt;br /&gt;
(the PDB file in this case is called '1FAT.pdb', 'PHA-L' is a user&lt;br /&gt;
defined name for the structure):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
structure = parser.get_structure('PHA-L', '1FAT.pdb')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I create a structure object from an mmCIF file? ====&lt;br /&gt;
&lt;br /&gt;
Similarly to the case the case of PDB files, first create an &amp;lt;code&amp;gt;MMCIFParser&amp;lt;/code&amp;gt;&lt;br /&gt;
object:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
parser = MMCIFParser()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Then use this parser to create a structure object from the mmCIF file:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
structure = parser.get_structure('PHA-L', '1FAT.cif')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== ...and what about the new PDB XML format? ====&lt;br /&gt;
&lt;br /&gt;
That's not yet supported, but I'm definitely planning to support that&lt;br /&gt;
in the future (it's not a lot of work). Contact me if you need this,&lt;br /&gt;
it might encourage me :-).&lt;br /&gt;
&lt;br /&gt;
==== I'd like to have some more low level access to an mmCIF file... ====&lt;br /&gt;
&lt;br /&gt;
You got it. You can create a python dictionary that maps all mmCIF&lt;br /&gt;
tags in an mmCIF file to their values. If there are multiple values&lt;br /&gt;
(like in the case of tag &amp;lt;code&amp;gt;_atom_site.Cartn_y&amp;lt;/code&amp;gt;, which holds&lt;br /&gt;
the ''y'' coordinates of all atoms), the tag is mapped to a list of values.&lt;br /&gt;
The dictionary is created from the mmCIF file as follows:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
mmcif_dict = MMCIF2Dict('1FAT.cif')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example: get the solvent content from an mmCIF file:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
sc = mmcif_dict['_exptl_crystal.density_percent_sol']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example: get the list of the ''y'' coordinates of all atoms&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
y_list = mmcif_dict['_atom_site.Cartn_y']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Can I access the header information? ====&lt;br /&gt;
&lt;br /&gt;
Thanks to Christian Rother you can access some information from the&lt;br /&gt;
PDB header. Note however that many PDB files contain headers with&lt;br /&gt;
incomplete or erroneous information. Many of the errors have been&lt;br /&gt;
fixed in the equivalent mmCIF files.&lt;br /&gt;
'''Hence, if you are interested in the header information, it is a good idea to extract information from mmCIF files using the &amp;lt;code&amp;gt;MMCIF2Dict&amp;lt;/code&amp;gt; tool described above, instead of parsing the PDB header.''' &lt;br /&gt;
&lt;br /&gt;
Now that is clarified, let's return to parsing the PDB header. The&lt;br /&gt;
structure object has an attribute called &amp;lt;code&amp;gt;header&amp;lt;/code&amp;gt; which is&lt;br /&gt;
a python dictionary that maps header records to their values.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
resolution = structure.header['resolution']&lt;br /&gt;
keywords = structure.header['keywords']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The available keys are &amp;lt;code&amp;gt;name&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;head&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;deposition_date&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;release_date&amp;lt;/code&amp;gt;,&lt;br /&gt;
&amp;lt;code&amp;gt;structure_method&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;resolution&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;structure_reference&amp;lt;/code&amp;gt; (maps to&lt;br /&gt;
a list of references), &amp;lt;code&amp;gt;journal_reference&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;author&amp;lt;/code&amp;gt; and&lt;br /&gt;
&amp;lt;code&amp;gt;compound&amp;lt;/code&amp;gt; (maps to a dictionary with various information about&lt;br /&gt;
the crystallized compound).&lt;br /&gt;
&lt;br /&gt;
The dictionary can also be created without creating a &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt;&lt;br /&gt;
object, ie. directly from the PDB file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
handle = open(filename,'r')&lt;br /&gt;
header_dict = parse_pdb_header(handle)&lt;br /&gt;
handle.close()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Can I use Bio.PDB with NMR structures (ie. with more than one model)? ====&lt;br /&gt;
&lt;br /&gt;
Sure. Many PDB parsers assume that there is only one model, making&lt;br /&gt;
them all but useless for NMR structures. The design of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt;&lt;br /&gt;
object makes it easy to handle PDB files with more than one model&lt;br /&gt;
(see section [[#The Structure object]]). &lt;br /&gt;
&lt;br /&gt;
==== How do I download structures from the PDB? ====&lt;br /&gt;
&lt;br /&gt;
This can be done using the &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object, using the &amp;lt;code&amp;gt;retrieve_pdb_file&amp;lt;/code&amp;gt;&lt;br /&gt;
method. The argument for this method is the PDB identifier of the&lt;br /&gt;
structure.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
pdbl = PDBList()&lt;br /&gt;
pdbl.retrieve_pdb_file('1FAT')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; class can also be used as a command-line tool: &lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
python PDBList.py 1fat&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
The downloaded file will be called &amp;lt;code&amp;gt;pdb1fat.ent&amp;lt;/code&amp;gt; and stored&lt;br /&gt;
in the current working directory. Note that the &amp;lt;code&amp;gt;retrieve_pdb_file&amp;lt;/code&amp;gt;&lt;br /&gt;
method also has an optional argument &amp;lt;code&amp;gt;pdir&amp;lt;/code&amp;gt; that specifies&lt;br /&gt;
a specific directory in which to store the downloaded PDB files. &lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;retrieve_pdb_file&amp;lt;/code&amp;gt; method also has some options for specifying&lt;br /&gt;
the compression format used for the download, and the program used&lt;br /&gt;
for local decompression (default &amp;lt;code&amp;gt;.Z&amp;lt;/code&amp;gt; format and &amp;lt;code&amp;gt;gunzip&amp;lt;/code&amp;gt;).&lt;br /&gt;
In addition, the PDB ftp site can be specified upon creation of the&lt;br /&gt;
&amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object. By default, the server of the Worldwide Protein Data Bank (\url{ftp://ftp.wwpdb.org/pub/pdb/data/structures/divided/pdb/})&lt;br /&gt;
is used. See the API documentation for more details. Thanks again&lt;br /&gt;
to Kristian Rother for donating this module.&lt;br /&gt;
&lt;br /&gt;
==== How do I download the entire PDB? ====&lt;br /&gt;
&lt;br /&gt;
The following commands will store all PDB files in the &amp;lt;code&amp;gt;/data/pdb&amp;lt;/code&amp;gt;&lt;br /&gt;
directory: &lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
python PDBList.py all /data/pdb&lt;br /&gt;
python PDBList.py all /data/pdb -d&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
The API method for this is called &amp;lt;code&amp;gt;download_entire_pdb&amp;lt;/code&amp;gt;.&lt;br /&gt;
Adding the &amp;lt;code&amp;gt;-d&amp;lt;/code&amp;gt; option will store all files in the same directory.&lt;br /&gt;
Otherwise, they are sorted into PDB-style subdirectories according&lt;br /&gt;
to their PDB ID's. Depending on the traffic, a complete download will&lt;br /&gt;
take 2-4 days.&lt;br /&gt;
&lt;br /&gt;
==== How do I keep a local copy of the PDB up-to-date? ====&lt;br /&gt;
&lt;br /&gt;
This can also be done using the &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object. One simply&lt;br /&gt;
creates a &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object (specifying the directory where&lt;br /&gt;
the local copy of the PDB is present) and calls the &amp;lt;code&amp;gt;update_pdb&amp;lt;/code&amp;gt;&lt;br /&gt;
method:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
pl = PDBList(pdb='/data/pdb')&lt;br /&gt;
pl.update_pdb()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
One can of course make a weekly cronjob out of this to keep&lt;br /&gt;
the local copy automatically up-to-date. The PDB ftp site can also&lt;br /&gt;
be specified (see API documentation).&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; has some additional methods that can be of use. The&lt;br /&gt;
&amp;lt;code&amp;gt;get_all_obsolete&amp;lt;/code&amp;gt; method can be used to get a list of all&lt;br /&gt;
obsolete PDB entries. The &amp;lt;code&amp;gt;changed_this_week&amp;lt;/code&amp;gt; method can&lt;br /&gt;
be used to obtain the entries that were added, modified or obsoleted&lt;br /&gt;
during the current week. For more info on the possibilities of &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt;,&lt;br /&gt;
see the API documentation.&lt;br /&gt;
&lt;br /&gt;
==== What about all those buggy PDB files? ====&lt;br /&gt;
&lt;br /&gt;
It is well known that many PDB files contain semantic errors (I'm&lt;br /&gt;
not talking about the structures themselves know, but their representation&lt;br /&gt;
in PDB files). Bio.PDB tries to handle this in two ways. The PDBParser&lt;br /&gt;
object can behave in two ways: a restrictive way and a permissive&lt;br /&gt;
way (THIS IS NOW THE DEFAULT). The restrictive way used to be the&lt;br /&gt;
default, but people seemed to think that Bio.PDB 'crashed' due to&lt;br /&gt;
a bug (hah!), so I changed it. If you ever encounter a real bug, please&lt;br /&gt;
tell me immediately!&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Permissive parser&lt;br /&gt;
parser = PDBParser(PERMISSIVE=1)&lt;br /&gt;
parser = PDBParser()  # The same (default)&lt;br /&gt;
# Strict parser&lt;br /&gt;
strict_parser = PDBParser(PERMISSIVE=0)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
In the permissive state (DEFAULT), PDB files that obviously contain&lt;br /&gt;
errors are 'corrected' (i.e. some residues or atoms are left out).&lt;br /&gt;
These errors include:&lt;br /&gt;
* Multiple residues with the same identifier&lt;br /&gt;
* Multiple atoms with the same identifier (taking into account the altloc identifier)&lt;br /&gt;
These errors indicate real problems in the PDB file (for details see&lt;br /&gt;
the Bioinformatics article). In the restrictive state, PDB files with&lt;br /&gt;
errors cause an exception to occur. This is useful to find errors&lt;br /&gt;
in PDB files.&lt;br /&gt;
&lt;br /&gt;
Some errors however are automatically corrected. Normally each disordered&lt;br /&gt;
atom should have a non-blank altloc identifier. However, there are&lt;br /&gt;
many structures that do not follow this convention, and have a blank&lt;br /&gt;
and a non-blank identifier for two disordered positions of the same&lt;br /&gt;
atom. This is automatically interpreted in the right way.&lt;br /&gt;
&lt;br /&gt;
Sometimes a structure contains a list of residues belonging to chain&lt;br /&gt;
A, followed by residues belonging to chain B, and again followed by&lt;br /&gt;
residues belonging to chain A, i.e. the chains are 'broken'. This&lt;br /&gt;
is also correctly interpreted.&lt;br /&gt;
&lt;br /&gt;
==== Can I write PDB files? ====&lt;br /&gt;
&lt;br /&gt;
Use the PDBIO class for this. It's easy to write out specific parts&lt;br /&gt;
of a structure too, of course.&lt;br /&gt;
&lt;br /&gt;
Example: saving a structure&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
io = PDBIO()&lt;br /&gt;
io.set_structure(s)&lt;br /&gt;
io.save('out.pdb')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
If you want to write out a part of the structure, make use of the&lt;br /&gt;
&amp;lt;code&amp;gt;Select&amp;lt;/code&amp;gt; class (also in &amp;lt;code&amp;gt;PDBIO&amp;lt;/code&amp;gt;). Select has four methods:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
accept_model(model)&lt;br /&gt;
accept_chain(chain)&lt;br /&gt;
accept_residue(residue)&lt;br /&gt;
accept_atom(atom)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
By default, every method returns 1 (which means the model/chain/residue/atom&lt;br /&gt;
is included in the output). By subclassing &amp;lt;code&amp;gt;Select&amp;lt;/code&amp;gt; and returning&lt;br /&gt;
0 when appropriate you can exclude models, chains, etc. from the output.&lt;br /&gt;
Cumbersome maybe, but very powerful. The following code only writes&lt;br /&gt;
out glycine residues:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
class GlySelect(Select):&lt;br /&gt;
    def accept_residue(self, residue):&lt;br /&gt;
        if residue.get_name()=='GLY':&lt;br /&gt;
            return 1&lt;br /&gt;
        else:&lt;br /&gt;
            return 0&lt;br /&gt;
&lt;br /&gt;
io = PDBIO()&lt;br /&gt;
io.set_structure(s)&lt;br /&gt;
io.save('gly_only.pdb', GlySelect())&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
If this is all too complicated for you, the &amp;lt;code&amp;gt;Dice&amp;lt;/code&amp;gt; module contains&lt;br /&gt;
a handy &amp;lt;code&amp;gt;extract&amp;lt;/code&amp;gt; function that writes out all residues in&lt;br /&gt;
a chain between a start and end residue.&lt;br /&gt;
&lt;br /&gt;
==== Can I write mmCIF files? ====&lt;br /&gt;
&lt;br /&gt;
No, and I also don't have plans to add that functionality soon (or&lt;br /&gt;
ever - I don't need it at all, and it's a lot of work, plus no-one&lt;br /&gt;
has ever asked for it). People who want to add this can contact me.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== The Structure object ===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==== What's the overall layout of a Structure object? ====&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object follows the so-called '''SMCRA'''&lt;br /&gt;
(Structure/Model/Chain/Residue/Atom) architecture : &lt;br /&gt;
&lt;br /&gt;
\begin{itemize&lt;br /&gt;
\item A structure consists of models&lt;br /&gt;
\item A model consists of chains&lt;br /&gt;
\item A chain consists of residues&lt;br /&gt;
\item A residue consists of atoms&lt;br /&gt;
\end{itemize&lt;br /&gt;
This is the way many structural biologists/bioinformaticians think&lt;br /&gt;
about structure, and provides a simple but efficient way to deal with&lt;br /&gt;
structure. Additional stuff is essentially added when needed. A UML&lt;br /&gt;
diagram of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object (forget about the &amp;lt;code&amp;gt;Disordered&amp;lt;/code&amp;gt;&lt;br /&gt;
classes for now) is shown in Fig. \ref{cap:SMCRA.&lt;br /&gt;
&lt;br /&gt;
%&lt;br /&gt;
\begin{figure[tbh]&lt;br /&gt;
\begin{center\includegraphics[%&lt;br /&gt;
  width=100mm,&lt;br /&gt;
  keepaspectratio]{images/smcra.png\end{center&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
\caption{\label{cap:SMCRAUML diagram of SMCRA architecture of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt;&lt;br /&gt;
object. Full lines with diamonds denote aggregation, full lines with&lt;br /&gt;
arrows denote referencing, full lines with triangles denote inheritance&lt;br /&gt;
and dashed lines with triangles denote interface realization. &lt;br /&gt;
\end{figure&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==== How do I navigate through a Structure object? ====&lt;br /&gt;
&lt;br /&gt;
The following code iterates through all atoms of a structure:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
p = PDBParser()&lt;br /&gt;
structure = p.get_structure('X', 'pdb1fat.ent')&lt;br /&gt;
for model in structure:&lt;br /&gt;
    for chain in model:&lt;br /&gt;
        for residue in chain:&lt;br /&gt;
            for atom in residue:&lt;br /&gt;
                print atom&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
There are also some shortcuts:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Iterate over all atoms in a structure&lt;br /&gt;
for atom in structure.get_atoms():&lt;br /&gt;
    print atom&lt;br /&gt;
&lt;br /&gt;
# Iterate over all residues in a model&lt;br /&gt;
for residue in model.get_residues():&lt;br /&gt;
    print residue&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Structures, models, chains, residues and atoms are called '''Entities'''&lt;br /&gt;
in Biopython. You can always get a parent &amp;lt;code&amp;gt;Entity&amp;lt;/code&amp;gt; from a child&lt;br /&gt;
&amp;lt;code&amp;gt;Entity&amp;lt;/code&amp;gt;, e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
residue = atom.get_parent()&lt;br /&gt;
chain = residue.get_parent()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
You can also test whether an &amp;lt;code&amp;gt;Entity&amp;lt;/code&amp;gt; has a certain child using&lt;br /&gt;
the &amp;lt;code&amp;gt;has_id&amp;lt;/code&amp;gt; method.&lt;br /&gt;
&lt;br /&gt;
==== Can I do that a bit more conveniently? ====&lt;br /&gt;
&lt;br /&gt;
You can do things like:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
atoms = structure.get_atoms()&lt;br /&gt;
residues = structure.get_residues()&lt;br /&gt;
atoms = chain.get_atoms()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
You can also use the &amp;lt;code&amp;gt;Selection.unfold_entities&amp;lt;/code&amp;gt; function:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Get all residues from a structure&lt;br /&gt;
res_list = Selection.unfold_entities(structure, 'R')&lt;br /&gt;
# Get all atoms from a chain&lt;br /&gt;
atom_list = Selection.unfold_entities(chain, 'A')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Obviously, &amp;lt;code&amp;gt;A&amp;lt;/code&amp;gt;=atom, &amp;lt;code&amp;gt;R&amp;lt;/code&amp;gt;=residue, &amp;lt;code&amp;gt;C&amp;lt;/code&amp;gt;=chain, &amp;lt;code&amp;gt;M&amp;lt;/code&amp;gt;=model, &amp;lt;code&amp;gt;S&amp;lt;/code&amp;gt;=structure.&lt;br /&gt;
You can use this to go up in the hierarchy, e.g. to get a list of (unique) &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;Chain&amp;lt;/code&amp;gt; parents from a list of&lt;br /&gt;
&amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;s:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
residue_list = Selection.unfold_entities(atom_list, 'R')&lt;br /&gt;
chain_list = Selection.unfold_entities(atom_list, 'C')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
For more info, see the API documentation.&lt;br /&gt;
&lt;br /&gt;
==== How do I extract a specific &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;Chain&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;Model&amp;lt;/code&amp;gt; from a &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt;? ====&lt;br /&gt;
&lt;br /&gt;
Easy. Here are some examples:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
model = structure[0]&lt;br /&gt;
chain = model['A']&lt;br /&gt;
residue = chain[100]&lt;br /&gt;
atom = residue['CA']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Note that you can use a shortcut:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
atom = structure[0]['A'][100]['CA']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== What is a model id? ====&lt;br /&gt;
&lt;br /&gt;
The model id is an integer which denotes the rank of the model in&lt;br /&gt;
the PDB/mmCIF file. The model is starts at 0. Crystal structures generally&lt;br /&gt;
have only one model (with id 0), while NMR files usually have several&lt;br /&gt;
models.&lt;br /&gt;
&lt;br /&gt;
==== What is a chain id? ====&lt;br /&gt;
&lt;br /&gt;
The chain id is specified in the PDB/mmCIF file, and is a single character&lt;br /&gt;
(typically a letter). &lt;br /&gt;
&lt;br /&gt;
==== What is a residue id? ====&lt;br /&gt;
&lt;br /&gt;
This is a bit more complicated, due to the clumsy PDB format. A residue&lt;br /&gt;
id is a tuple with three elements:&lt;br /&gt;
* The '''hetero-flag''': this is &amp;lt;code&amp;gt;'H_'&amp;lt;/code&amp;gt; plus the name of the hetero-residue (e.g. &amp;lt;code&amp;gt;'H_GLC'&amp;lt;/code&amp;gt; in the case of a glucose molecule), or &amp;lt;code&amp;gt;'W'&amp;lt;/code&amp;gt; in the case of a water molecule.&lt;br /&gt;
* The '''sequence identifier''' in the chain, e.g. 100&lt;br /&gt;
* The '''insertion code''', e.g. &amp;lt;code&amp;gt;'A'&amp;lt;/code&amp;gt;. The insertion code is sometimes used to preserve a certain desirable residue numbering scheme. A Ser 80 insertion mutant (inserted e.g. between a Thr 80 and an Asn 81 residue) could e.g. have sequence identifiers and insertion codes as follows: Thr 80 A, Ser 80 B, Asn 81. In this way the residue numbering scheme stays in tune with that of the wild type structure.&lt;br /&gt;
&lt;br /&gt;
The id of the above glucose residue would thus be &amp;lt;code&amp;gt;('H_GLC', 100, 'A')&amp;lt;/code&amp;gt;. If the hetero-flag and insertion code are blank, the sequence&lt;br /&gt;
identifier alone can be used:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Full id&lt;br /&gt;
residue = chain[(' ', 100, ' ')]&lt;br /&gt;
# Shortcut id&lt;br /&gt;
residue = chain[100]&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
The reason for the hetero-flag is that many, many PDB files use the same sequence identifier for an amino acid and a hetero-residue or a water, which would create obvious problems if the hetero-flag was not used.&lt;br /&gt;
&lt;br /&gt;
==== What is an atom id? ====&lt;br /&gt;
&lt;br /&gt;
The atom id is simply the atom name (eg. &amp;lt;code&amp;gt;'CA'&amp;lt;/code&amp;gt;). In practice,&lt;br /&gt;
the atom name is created by stripping all spaces from the atom name&lt;br /&gt;
in the PDB file. &lt;br /&gt;
&lt;br /&gt;
However, in PDB files, a space can be part of an atom name. Often,&lt;br /&gt;
calcium atoms are called &amp;lt;code&amp;gt;'CA..'&amp;lt;/code&amp;gt; in order to distinguish them&lt;br /&gt;
from C&amp;amp;alpha; atoms (which are called &amp;lt;code&amp;gt;'.CA.'&amp;lt;/code&amp;gt;). In cases&lt;br /&gt;
were stripping the spaces would create problems (ie. two atoms called&lt;br /&gt;
&amp;lt;code&amp;gt;'CA'&amp;lt;/code&amp;gt; in the same residue) the spaces are kept.&lt;br /&gt;
&lt;br /&gt;
==== How is disorder handled? ====&lt;br /&gt;
&lt;br /&gt;
This is one of the strong points of Bio.PDB. It can handle both disordered&lt;br /&gt;
atoms and point mutations (ie. a Gly and an Ala residue in the same&lt;br /&gt;
position). &lt;br /&gt;
&lt;br /&gt;
Disorder should be dealt with from two points of view: the atom and&lt;br /&gt;
the residue points of view. In general, I have tried to encapsulate&lt;br /&gt;
all the complexity that arises from disorder. If you just want to&lt;br /&gt;
loop over all C&amp;amp;alpha; atoms, you do not care that some residues&lt;br /&gt;
have a disordered side chain. On the other hand it should also be&lt;br /&gt;
possible to represent disorder completely in the data structure. Therefore,&lt;br /&gt;
disordered atoms or residues are stored in special objects that behave&lt;br /&gt;
as if there is no disorder. This is done by only representing a subset&lt;br /&gt;
of the disordered atoms or residues. Which subset is picked (e.g.&lt;br /&gt;
which of the two disordered OG side chain atom positions of a Ser&lt;br /&gt;
residue is used) can be specified by the user.&lt;br /&gt;
&lt;br /&gt;
'''Disordered atom positions''' are represented by ordinary &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;&lt;br /&gt;
objects, but all &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects that represent the same physical&lt;br /&gt;
atom are stored in a &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; object (see Fig. \ref{cap:SMCRA).&lt;br /&gt;
Each &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object in a &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; object can&lt;br /&gt;
be uniquely indexed using its altloc specifier. The &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt;&lt;br /&gt;
object forwards all uncaught method calls to the selected Atom object,&lt;br /&gt;
by default the one that represents the atom with the highest&lt;br /&gt;
occupancy. The user can of course change the selected &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;&lt;br /&gt;
object, making use of its altloc specifier. In this way atom disorder&lt;br /&gt;
is represented correctly without much additional complexity. In other&lt;br /&gt;
words, if you are not interested in atom disorder, you will not be&lt;br /&gt;
bothered by it.&lt;br /&gt;
&lt;br /&gt;
Each disordered atom has a characteristic altloc identifier. You can&lt;br /&gt;
specify that a &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; object should behave like&lt;br /&gt;
the &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object associated with a specific altloc identifier:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
atom.disordered_select('A')  # select altloc A atom&lt;br /&gt;
atom.disordered_select('B')  # select altloc B atom &lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
A special case arises when disorder is due to '''point mutations''',&lt;br /&gt;
i.e. when two or more point mutants of a polypeptide are present in&lt;br /&gt;
the crystal. An example of this can be found in PDB structure 1EN2.&lt;br /&gt;
&lt;br /&gt;
Since these residues belong to a different residue type (e.g. let's&lt;br /&gt;
say Ser 60 and Cys 60) they should not be stored in a single &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt;&lt;br /&gt;
object as in the common case. In this case, each residue is represented&lt;br /&gt;
by one &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object, and both &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; objects&lt;br /&gt;
are stored in a single &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt; object (see Fig.&lt;br /&gt;
\ref{cap:SMCRA).&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt; object forwards all uncaught&lt;br /&gt;
methods to the selected &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object (by default the last&lt;br /&gt;
&amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object added), and thus behaves like an ordinary&lt;br /&gt;
residue. Each &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object in a &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt;&lt;br /&gt;
object can be uniquely identified by its residue name. In the above&lt;br /&gt;
example, residue Ser 60 would have id 'SER' in the &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt;&lt;br /&gt;
object, while residue Cys 60 would have id 'CYS'. The user can select&lt;br /&gt;
the active &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object in a &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt;&lt;br /&gt;
object via this id.&lt;br /&gt;
&lt;br /&gt;
Example: suppose that a chain has a point mutation at position 10,&lt;br /&gt;
consisting of a Ser and a Cys residue. Make sure that residue 10 of&lt;br /&gt;
this chain behaves as the Cys residue.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
residue = chain[10]&lt;br /&gt;
residue.disordered_select('CYS')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
In addition, you can get a list of all &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects (ie.&lt;br /&gt;
all &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; objects are 'unpacked' to their individual&lt;br /&gt;
&amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects) using the &amp;lt;code&amp;gt;get_unpacked_list&amp;lt;/code&amp;gt; method&lt;br /&gt;
of a (&amp;lt;code&amp;gt;Disordered&amp;lt;/code&amp;gt;)&amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object.&lt;br /&gt;
&lt;br /&gt;
==== Can I sort residues in a chain somehow? ====&lt;br /&gt;
&lt;br /&gt;
Yes, kinda, but I'm waiting for a request for this feature to finish&lt;br /&gt;
it :-).&lt;br /&gt;
&lt;br /&gt;
==== How are ligands and solvent handled? ====&lt;br /&gt;
&lt;br /&gt;
See 'What is a residue id?'.&lt;br /&gt;
&lt;br /&gt;
==== What about B factors? ====&lt;br /&gt;
&lt;br /&gt;
Well, yes! Bio.PDB supports isotropic and anisotropic B factors, and&lt;br /&gt;
also deals with standard deviations of anisotropic B factor if present&lt;br /&gt;
(see the section [[#Analysis]]).&lt;br /&gt;
&lt;br /&gt;
==== What about standard deviation of atomic positions? ====&lt;br /&gt;
&lt;br /&gt;
Yup, supported. See the section [[#Analysis]].&lt;br /&gt;
&lt;br /&gt;
==== I think the SMCRA data structure is not flexible/sexy/whatever enough... ====&lt;br /&gt;
&lt;br /&gt;
Sure, sure. Everybody is always coming up with (mostly vaporware or&lt;br /&gt;
partly implemented) data structures that handle all possible situations&lt;br /&gt;
and are extensible in all thinkable (and unthinkable) ways. The prosaic&lt;br /&gt;
truth however is that 99.9\% of people using (and I mean really using!)&lt;br /&gt;
crystal structures think in terms of models, chains, residues and&lt;br /&gt;
atoms. The philosophy of Bio.PDB is to provide a reasonably fast,&lt;br /&gt;
clean, simple, but complete data structure to access structure data.&lt;br /&gt;
The proof of the pudding is in the eating.&lt;br /&gt;
&lt;br /&gt;
Moreover, it is quite easy to build more specialised data structures&lt;br /&gt;
on top of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; class (eg. there's a &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt;&lt;br /&gt;
class). On the other hand, the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object is built&lt;br /&gt;
using a Parser/Consumer approach (called &amp;lt;code&amp;gt;PDBParser&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;MMCIFParser&amp;lt;/code&amp;gt;&lt;br /&gt;
and &amp;lt;code&amp;gt;StructureBuilder&amp;lt;/code&amp;gt;, respectively). One can easily reuse&lt;br /&gt;
the PDB/mmCIF parsers by implementing a specialised &amp;lt;code&amp;gt;StructureBuilder&amp;lt;/code&amp;gt;&lt;br /&gt;
class. It is of course also trivial to add support for new file formats&lt;br /&gt;
by writing new parsers.&lt;br /&gt;
&lt;br /&gt;
=== Analysis ===&lt;br /&gt;
&lt;br /&gt;
==== How do I extract information from an &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object? ====&lt;br /&gt;
&lt;br /&gt;
Using the following methods:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
a.get_name()           # atom name (spaces stripped, e.g. 'CA')&lt;br /&gt;
a.get_id()             # id (equals atom name)&lt;br /&gt;
a.get_coord()          # atomic coordinates&lt;br /&gt;
a.get_vector()         # atomic coordinates as Vector object&lt;br /&gt;
a.get_bfactor()        # isotropic B factor&lt;br /&gt;
a.get_occupancy()      # occupancy&lt;br /&gt;
a.get_altloc()         # alternative location specifier&lt;br /&gt;
a.get_sigatm()         # std. dev. of atomic parameters&lt;br /&gt;
a.get_siguij()         # std. dev. of anisotropic B factor&lt;br /&gt;
a.get_anisou()         # anisotropic B factor&lt;br /&gt;
a.get_fullname()       # atom name (with spaces, e.g. '.CA.')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I extract information from a &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object? ====&lt;br /&gt;
&lt;br /&gt;
Using the following methods:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
r.get_resname()         # return the residue name (eg. 'GLY')&lt;br /&gt;
r.is_disordered()       # 1 if the residue has disordered atoms&lt;br /&gt;
r.get_segid()           # return the SEGID&lt;br /&gt;
r.has_id(name)          # test if a residue has a certain atom&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I measure distances? ====&lt;br /&gt;
&lt;br /&gt;
That's simple: the minus operator for atoms has been overloaded to&lt;br /&gt;
return the distance between two atoms. &lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Get some atoms&lt;br /&gt;
ca1 = residue1['CA']&lt;br /&gt;
ca2 = residue2['CA']&lt;br /&gt;
# Simply subtract the atoms to get their distance&lt;br /&gt;
distance = ca1-ca2&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I measure angles? ====&lt;br /&gt;
&lt;br /&gt;
This can easily be done via the vector representation of the atomic coordinates, and the &amp;lt;code&amp;gt;calc_angle&amp;lt;/code&amp;gt; function from the &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; module:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
vector1 = atom1.get_vector()&lt;br /&gt;
vector2 = atom2.get_vector()&lt;br /&gt;
vector3 = atom3.get_vector()&lt;br /&gt;
angle = calc_angle(vector1, vector2, vector3)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I measure torsion angles? ====&lt;br /&gt;
&lt;br /&gt;
Again, this can easily be done via the vector representation of the&lt;br /&gt;
atomic coordinates, this time using the &amp;lt;code&amp;gt;calc_dihedral&amp;lt;/code&amp;gt; function&lt;br /&gt;
from the &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; module:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
vector1=atom1.get_vector()&lt;br /&gt;
vector2=atom2.get_vector()&lt;br /&gt;
vector3=atom3.get_vector()&lt;br /&gt;
vector4=atom4.get_vector()&lt;br /&gt;
angle=calc_dihedral(vector1, vector2, vector3, vector4)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I determine atom-atom contacts? ====&lt;br /&gt;
&lt;br /&gt;
Use &amp;lt;code&amp;gt;NeighborSearch&amp;lt;/code&amp;gt;. This uses a KD tree data structure coded&lt;br /&gt;
in C behind the screens, so it's pretty darn fast (see &amp;lt;code&amp;gt;Bio.KDTree&amp;lt;/code&amp;gt;).&lt;br /&gt;
&lt;br /&gt;
==== How do I extract polypeptides from a &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object? ====&lt;br /&gt;
&lt;br /&gt;
Use &amp;lt;code&amp;gt;PolypeptideBuilder&amp;lt;/code&amp;gt;. You can use the resulting &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt;&lt;br /&gt;
object to get the sequence as a &amp;lt;code&amp;gt;Seq&amp;lt;/code&amp;gt; object or to get a list&lt;br /&gt;
of C&amp;amp;alpha; atoms as well. Polypeptides can be built using a C-N&lt;br /&gt;
or a C&amp;amp;alpha;-C&amp;amp;alpha; distance criterion.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Using C-N &lt;br /&gt;
ppb=PPBuilder()&lt;br /&gt;
for pp in ppb.build_peptides(structure): &lt;br /&gt;
    print pp.get_sequence()&lt;br /&gt;
&lt;br /&gt;
# Using CA-CA&lt;br /&gt;
ppb=CaPPBuilder()&lt;br /&gt;
for pp in ppb.build_peptides(structure): &lt;br /&gt;
    print pp.get_sequence()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that in the above case only model 0 of the structure is considered&lt;br /&gt;
by &amp;lt;code&amp;gt;PolypeptideBuilder&amp;lt;/code&amp;gt;. However, it is possible to use &amp;lt;code&amp;gt;PolypeptideBuilder&amp;lt;/code&amp;gt;&lt;br /&gt;
to build &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt; objects from &amp;lt;code&amp;gt;Model&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;Chain&amp;lt;/code&amp;gt;&lt;br /&gt;
objects as well.&lt;br /&gt;
&lt;br /&gt;
==== How do I get the sequence of a structure? ====&lt;br /&gt;
&lt;br /&gt;
The first thing to do is to extract all polypeptides from the structure&lt;br /&gt;
(see previous entry). The sequence of each polypeptide can then easily&lt;br /&gt;
be obtained from the &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt; objects. The sequence is&lt;br /&gt;
represented as a Biopython &amp;lt;code&amp;gt;Seq&amp;lt;/code&amp;gt; object, and its alphabet is&lt;br /&gt;
defined by a &amp;lt;code&amp;gt;ProteinAlphabet&amp;lt;/code&amp;gt; object.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; seq = polypeptide.get_sequence()&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print seq&lt;br /&gt;
Seq('SNVVE...', &amp;lt;class Bio.Alphabet.ProteinAlphabet&amp;gt;)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I determine secondary structure? ====&lt;br /&gt;
&lt;br /&gt;
For this functionality, you need to install DSSP (and obtain a license&lt;br /&gt;
for it - free for academic use, see http://www.cmbi.kun.nl/gv/dssp/).&lt;br /&gt;
Then use the &amp;lt;code&amp;gt;DSSP&amp;lt;/code&amp;gt; class, which maps &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; objects&lt;br /&gt;
to their secondary structure (and accessible surface area). The DSSP&lt;br /&gt;
codes are listed in Table \ref{cap:DSSP-codes. Note that DSSP (the&lt;br /&gt;
program, and thus by consequence the class) cannot handle multiple&lt;br /&gt;
models!&lt;br /&gt;
&lt;br /&gt;
%&lt;br /&gt;
\begin{table&lt;br /&gt;
&lt;br /&gt;
==== \begin{tabular{|c|c|&lt;br /&gt;
\hline &lt;br /&gt;
Code&amp;amp;&lt;br /&gt;
Secondary structure\tabularnewline&lt;br /&gt;
\hline&lt;br /&gt;
\hline &lt;br /&gt;
H&amp;amp;&lt;br /&gt;
&amp;amp;alpha;-helix\tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
B&amp;amp;&lt;br /&gt;
Isolated &amp;amp;beta;-bridge residue\tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
E&amp;amp;&lt;br /&gt;
Strand \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
G&amp;amp;&lt;br /&gt;
3-10 helix \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
I&amp;amp;&lt;br /&gt;
$\Pi$-helix \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
T&amp;amp;&lt;br /&gt;
Turn\tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
S&amp;amp;&lt;br /&gt;
Bend \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
-&amp;amp;&lt;br /&gt;
Other\tabularnewline&lt;br /&gt;
\hline&lt;br /&gt;
\end{tabular&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
\caption{\label{cap:DSSP-codesDSSP codes in Bio.PDB.&lt;br /&gt;
\end{table&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==== How do I calculate the accessible surface area of a residue? ====&lt;br /&gt;
&lt;br /&gt;
Use the &amp;lt;code&amp;gt;DSSP&amp;lt;/code&amp;gt; class (see also previous entry). But see also&lt;br /&gt;
next entry.&lt;br /&gt;
&lt;br /&gt;
==== How do I calculate residue depth? ====&lt;br /&gt;
&lt;br /&gt;
Residue depth is the average distance of a residue's atoms from the&lt;br /&gt;
solvent accessible surface. It's a fairly new and very powerful parameterization&lt;br /&gt;
of solvent accessibility. For this functionality, you need to install&lt;br /&gt;
Michel Sanner's MSMS program (http://www.scripps.edu/pub/olson-web/people/sanner/html/msms_home.html).&lt;br /&gt;
Then use the &amp;lt;code&amp;gt;ResidueDepth&amp;lt;/code&amp;gt; class. This class behaves as a&lt;br /&gt;
dictionary which maps &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; objects to corresponding (residue&lt;br /&gt;
depth, C&amp;amp;alpha; depth) tuples. The C&amp;amp;alpha; depth is the distance&lt;br /&gt;
of a residue's C&amp;amp;alpha; atom to the solvent accessible surface. &lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
model = structure[0]&lt;br /&gt;
rd = ResidueDepth(model, pdb_file)&lt;br /&gt;
residue_depth, ca_depth = rd[some_residue]&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
You can also get access to the molecular surface itself (via the &amp;lt;code&amp;gt;get_surface&amp;lt;/code&amp;gt;&lt;br /&gt;
function), in the form of a Numeric python array with the surface points.&lt;br /&gt;
&lt;br /&gt;
==== How do I calculate Half Sphere Exposure? ====&lt;br /&gt;
&lt;br /&gt;
Half Sphere Exposure (HSE) is a new, 2D measure of solvent exposure.&lt;br /&gt;
Basically, it counts the number of C&amp;amp;alpha; atoms around a residue&lt;br /&gt;
in the direction of its side chain, and in the opposite direction&lt;br /&gt;
(within a radius of 13 Å). Despite its simplicity, it outperforms&lt;br /&gt;
many other measures of solvent exposure. An article describing this&lt;br /&gt;
novel 2D measure has been submitted.&lt;br /&gt;
&lt;br /&gt;
HSE comes in two flavors: HSE&amp;amp;alpha; and HSE&amp;amp;beta;. The former&lt;br /&gt;
only uses the C&amp;amp;alpha; atom positions, while the latter uses the&lt;br /&gt;
C&amp;amp;alpha; and C&amp;amp;beta; atom positions. The HSE measure is calculated&lt;br /&gt;
by the &amp;lt;code&amp;gt;HSExposure&amp;lt;/code&amp;gt; class, which can also calculate the contact&lt;br /&gt;
number. The latter class has methods which return dictionaries that&lt;br /&gt;
map a &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object to its corresponding HSE&amp;amp;alpha;, HSE&amp;amp;beta;&lt;br /&gt;
and contact number values.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
model = structure[0]&lt;br /&gt;
hse = HSExposure()&lt;br /&gt;
# Calculate HSEalpha&lt;br /&gt;
exp_ca = hse.calc_hs_exposure(model, option='CA3')&lt;br /&gt;
# Calculate HSEbeta&lt;br /&gt;
exp_cb = hse.calc_hs_exposure(model, option='CB')&lt;br /&gt;
# Calculate classical coordination number&lt;br /&gt;
exp_fs = hse.calc_fs_exposure(model)&lt;br /&gt;
# Print HSEalpha for a residue&lt;br /&gt;
print exp_ca[some_residue]&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I map the residues of two related structures onto each other? ====&lt;br /&gt;
&lt;br /&gt;
First, create an alignment file in FASTA format, then use the &amp;lt;code&amp;gt;StructureAlignment&amp;lt;/code&amp;gt;&lt;br /&gt;
class. This class can also be used for alignments with more than two&lt;br /&gt;
structures.&lt;br /&gt;
&lt;br /&gt;
==== How do I test if a Residue object is an amino acid? ====&lt;br /&gt;
&lt;br /&gt;
Use &amp;lt;code&amp;gt;is_aa(residue)&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
==== Can I do vector operations on atomic coordinates? ====&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects return a &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; object representation&lt;br /&gt;
of the coordinates with the &amp;lt;code&amp;gt;get_vector&amp;lt;/code&amp;gt; method. &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt;&lt;br /&gt;
implements the full set of 3D vector operations, matrix multiplication&lt;br /&gt;
(left and right) and some advanced rotation-related operations as&lt;br /&gt;
well. See also next question.&lt;br /&gt;
&lt;br /&gt;
==== How do I put a virtual C&amp;amp;beta; on a Gly residue? ====&lt;br /&gt;
&lt;br /&gt;
OK, I admit, this example is only present to show off the possibilities&lt;br /&gt;
of Bio.PDB's &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; module (though this code is actually&lt;br /&gt;
used in the &amp;lt;code&amp;gt;HSExposure&amp;lt;/code&amp;gt; module, which contains a novel way&lt;br /&gt;
to parametrize residue exposure - publication underway). Suppose that&lt;br /&gt;
you would like to find the position of a Gly residue's C&amp;amp;beta; atom,&lt;br /&gt;
if it had one. How would you do that? Well, rotating the N atom of&lt;br /&gt;
the Gly residue along the C&amp;amp;alpha;-C bond over -120 degrees roughly&lt;br /&gt;
puts it in the position of a virtual C&amp;amp;beta; atom. Here's how to&lt;br /&gt;
do it, making use of the &amp;lt;code&amp;gt;rotaxis&amp;lt;/code&amp;gt; method (which can be used&lt;br /&gt;
to construct a rotation around a certain axis) of the &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt;&lt;br /&gt;
module:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# get atom coordinates as vectors&lt;br /&gt;
n = residue['N'].get_vector() &lt;br /&gt;
c = residue['C'].get_vector() &lt;br /&gt;
ca = residue['CA'].get_vector()&lt;br /&gt;
# center at origin&lt;br /&gt;
n = n - ca&lt;br /&gt;
c = c - ca&lt;br /&gt;
# find rotation matrix that rotates n -120 degrees along the ca-c vector&lt;br /&gt;
rot = rotaxis(-pi{*120.0/180.0, c)&lt;br /&gt;
# apply rotation to ca-n vector&lt;br /&gt;
cb_at_origin = n.left_multiply(rot)&lt;br /&gt;
# put on top of ca atom&lt;br /&gt;
cb = cb_at_origin + ca&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
This example shows that it's possible to do some quite nontrivial&lt;br /&gt;
vector operations on atomic data, which can be quite useful. In addition&lt;br /&gt;
to all the usual vector operations (cross (use &amp;lt;code&amp;gt;**&amp;lt;/code&amp;gt;), and&lt;br /&gt;
dot (use &amp;lt;code&amp;gt;*&amp;lt;/code&amp;gt;) product, angle, norm, etc.) and the above mentioned&lt;br /&gt;
&amp;lt;code&amp;gt;rotaxis function, the &amp;lt;code&amp;gt;Vector module also has methods&lt;br /&gt;
to rotate (&amp;lt;code&amp;gt;rotmat&amp;lt;/code&amp;gt;) or reflect (&amp;lt;code&amp;gt;refmat&amp;lt;/code&amp;gt;) one vector&lt;br /&gt;
on top of another.&lt;br /&gt;
&lt;br /&gt;
=== Manipulating the structure ===&lt;br /&gt;
&lt;br /&gt;
==== How do I superimpose two structures? ====&lt;br /&gt;
&lt;br /&gt;
Surprisingly, this is done using the &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; object.&lt;br /&gt;
This object calculates the rotation and translation matrix that rotates&lt;br /&gt;
two lists of atoms on top of each other in such a way that their RMSD&lt;br /&gt;
is minimized. Of course, the two lists need to contain the same amount&lt;br /&gt;
of atoms. The &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; object can also apply the rotation/translation&lt;br /&gt;
to a list of atoms. The rotation and translation are stored as a tuple&lt;br /&gt;
in the &amp;lt;code&amp;gt;rotran&amp;lt;/code&amp;gt; attribute of the &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; object&lt;br /&gt;
(note that the rotation is right multiplying!). The RMSD is stored&lt;br /&gt;
in the &amp;lt;code&amp;gt;rmsd&amp;lt;/code&amp;gt; attribute.&lt;br /&gt;
&lt;br /&gt;
The algorithm used by &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; comes from \textit{Matrix&lt;br /&gt;
computations, 2nd ed. Golub, G. \&amp;amp; Van Loan (1989) and makes use&lt;br /&gt;
of singular value decomposition (this is implemented in the general&lt;br /&gt;
&amp;lt;code&amp;gt;Bio.SVDSuperimposer&amp;lt;/code&amp;gt; module).&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
sup = Superimposer()&lt;br /&gt;
# Specify the atom lists&lt;br /&gt;
# 'fixed' and 'moving' are lists of Atom objects&lt;br /&gt;
# The moving atoms will be put on the fixed atoms&lt;br /&gt;
sup.set_atoms(fixed, moving)&lt;br /&gt;
# Print rotation/translation/rmsd&lt;br /&gt;
print sup.rotran&lt;br /&gt;
print sup.rms &lt;br /&gt;
# Apply rotation/translation to the moving atoms&lt;br /&gt;
sup.apply(moving)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I superimpose two structures based on their active sites? ====&lt;br /&gt;
&lt;br /&gt;
Pretty easily. Use the active site atoms to calculate the rotation/translation&lt;br /&gt;
matrices (see above), and apply these to the whole molecule.&lt;br /&gt;
&lt;br /&gt;
==== Can I manipulate the atomic coordinates? ====&lt;br /&gt;
&lt;br /&gt;
Yes, using the &amp;lt;code&amp;gt;transform&amp;lt;/code&amp;gt; method of the &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object,&lt;br /&gt;
or directly using the &amp;lt;code&amp;gt;set_coord&amp;lt;/code&amp;gt; method.&lt;br /&gt;
&lt;br /&gt;
== Other Structural Bioinformatics modules ==&lt;br /&gt;
&lt;br /&gt;
==== Bio.SCOP ====&lt;br /&gt;
&lt;br /&gt;
See the main Biopython tutorial.&lt;br /&gt;
&lt;br /&gt;
==== Bio.FSSP ====&lt;br /&gt;
&lt;br /&gt;
No documentation available yet.&lt;br /&gt;
&lt;br /&gt;
== You haven't answered my question yet! ==&lt;br /&gt;
&lt;br /&gt;
Woah! It's late and I'm tired, and a glass of excellent ''Pedro Ximenez'' sherry is waiting for me. Just drop me a mail, and I'll answer&lt;br /&gt;
you in the morning (with a bit of luck...).&lt;br /&gt;
&lt;br /&gt;
== Contributors ==&lt;br /&gt;
&lt;br /&gt;
The main author/maintainer of Bio.PDB is:&lt;br /&gt;
&lt;br /&gt;
 Thomas Hamelryck&lt;br /&gt;
 Bioinformatics center&lt;br /&gt;
 Institute of Molecular Biology&lt;br /&gt;
 University of Copenhagen&lt;br /&gt;
 Universitetsparken 15, Bygning 10&lt;br /&gt;
 DK-2100 København Ø&lt;br /&gt;
 Denmark&lt;br /&gt;
 thamelry@binf.ku.dk&lt;br /&gt;
&lt;br /&gt;
Kristian Rother donated code to interact with the PDB database, and to parse the PDB&lt;br /&gt;
header. Indraneel Majumdar sent in some bug reports and assisted in&lt;br /&gt;
coding the &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt; module. Many thanks to Brad Chapman,&lt;br /&gt;
Jeffrey Chang, Andrew Dalke and Iddo Friedberg for suggestions, comments,&lt;br /&gt;
help and/or biting criticism :-).&lt;br /&gt;
&lt;br /&gt;
== Can I contribute? ==&lt;br /&gt;
&lt;br /&gt;
Yes, yes, yes! Just send an e-mail to [mailto:thamelry@binf.ku.dk Thomas Hamelryck] or to [mailto:biopython-dev@biopython.org the Biopython developers] if you have something useful to contribute! Eternal fame awaits!&lt;/div&gt;</summary>
		<author><name>Mdehoon</name></author>	</entry>

	<entry>
		<id>http://www.biopython.org/wiki/The_Biopython_Structural_Bioinformatics_FAQ</id>
		<title>The Biopython Structural Bioinformatics FAQ</title>
		<link rel="alternate" type="text/html" href="http://www.biopython.org/wiki/The_Biopython_Structural_Bioinformatics_FAQ"/>
				<updated>2013-03-26T03:36:40Z</updated>
		
		<summary type="html">&lt;p&gt;Mdehoon: /* Can I contribute? */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Introduction ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB is a biopython module that focuses on working with crystal&lt;br /&gt;
structures of biological macromolecules. This document gives a fairly&lt;br /&gt;
complete overview of Bio.PDB.&lt;br /&gt;
&lt;br /&gt;
== Bio.PDB's installation ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB is automatically installed as part of Biopython. Biopython&lt;br /&gt;
can be obtained from http://www.biopython.org. It runs on many&lt;br /&gt;
platforms (Linux/Unix, windows, Mac,...).&lt;br /&gt;
&lt;br /&gt;
== Who's using Bio.PDB? ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB was used in the construction of DISEMBL, a web server that&lt;br /&gt;
predicts disordered regions in proteins (http://dis.embl.de/),&lt;br /&gt;
and COLUMBA, a website that provides annotated protein structures&lt;br /&gt;
(http://www.columba-db.de/). Bio.PDB has also been used to&lt;br /&gt;
perform a large scale search for active sites similarities between&lt;br /&gt;
protein structures in the PDB (see [http://dx.doi.org/10.1002/prot.10338 \textit{Proteins Struct. Func.&lt;br /&gt;
Gen., \textbf{2003, 51, 96-108]) , and to develop a new algorithm&lt;br /&gt;
that identifies linear secondary structure elements ([\emph{BMC Bioinformatics,&lt;br /&gt;
\textbf{2005, 6, 202 http://www.biomedcentral.com/1471-2105/6/202]).&lt;br /&gt;
&lt;br /&gt;
Judging from requests for features and information, Bio.PDB is also&lt;br /&gt;
used by several LPCs (Large Pharmaceutical Companies :-).&lt;br /&gt;
&lt;br /&gt;
== Is there a Bio.PDB reference? ==&lt;br /&gt;
&lt;br /&gt;
Yes, and I'd appreciate it if you would refer to Bio.PDB in publications&lt;br /&gt;
if you make use of it. The reference is:&lt;br /&gt;
&lt;br /&gt;
\begin{quote&lt;br /&gt;
Hamelryck, T., Manderick, B. (2003) PDB parser and structure class&lt;br /&gt;
implemented in Python. \textit{Bioinformatics, \textbf{19, 2308-2310. &lt;br /&gt;
\end{quote&lt;br /&gt;
The article can be freely downloaded via the Bioinformatics journal&lt;br /&gt;
website (http://www.binf.ku.dk/users/thamelry/references.html).&lt;br /&gt;
I welcome e-mails telling me what you are using Bio.PDB for. Feature&lt;br /&gt;
requests are welcome too.&lt;br /&gt;
&lt;br /&gt;
== How well tested is Bio.PDB? ==&lt;br /&gt;
&lt;br /&gt;
Pretty well, actually. Bio.PDB has been extensively tested on nearly&lt;br /&gt;
5500 structures from the PDB - all structures seemed to be parsed&lt;br /&gt;
correctly. More details can be found in the Bio.PDB Bioinformatics&lt;br /&gt;
article. Bio.PDB has been used/is being used in many research projects&lt;br /&gt;
as a reliable tool. In fact, I'm using Bio.PDB almost daily for research&lt;br /&gt;
purposes and continue working on improving it and adding new features.&lt;br /&gt;
&lt;br /&gt;
== How fast is it? ==&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;PDBParser&amp;lt;/code&amp;gt; performance was tested on about 800 structures&lt;br /&gt;
(each belonging to a unique SCOP superfamily). This takes about 20&lt;br /&gt;
minutes, or on average 1.5 seconds per structure. Parsing the structure&lt;br /&gt;
of the large ribosomal subunit (1FKK), which contains about 64000&lt;br /&gt;
atoms, takes 10 seconds on a 1000 MHz PC. In short: it's more than&lt;br /&gt;
fast enough for many applications.&lt;br /&gt;
&lt;br /&gt;
== Why should I use Bio.PDB? ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB might be exactly what you want, and then again it might not.&lt;br /&gt;
If you are interested in data mining the PDB header, you might want&lt;br /&gt;
to look elsewhere because there is only limited support for this.&lt;br /&gt;
If you look for a powerful, complete data structure to access the&lt;br /&gt;
atomic data Bio.PDB is probably for you. &lt;br /&gt;
&lt;br /&gt;
== Usage ==&lt;br /&gt;
&lt;br /&gt;
=== General questions ===&lt;br /&gt;
&lt;br /&gt;
==== Importing Bio.PDB ====&lt;br /&gt;
&lt;br /&gt;
That's simple:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
from Bio.PDB import *&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Is there support for molecular graphics? ====&lt;br /&gt;
&lt;br /&gt;
Not directly, mostly since there are quite a few Python based/Python&lt;br /&gt;
aware solutions already, that can potentially be used with Bio.PDB.&lt;br /&gt;
My choice is Pymol, BTW (I've used this successfully with Bio.PDB,&lt;br /&gt;
and there will probably be specific PyMol modules in Bio.PDB soon/some&lt;br /&gt;
day). Python based/aware molecular graphics solutions include:&lt;br /&gt;
&lt;br /&gt;
* PyMol: http://pymol.sourceforge.net/&lt;br /&gt;
* Chimera: http://www.cgl.ucsf.edu/chimera/&lt;br /&gt;
* PMV: http://www.scripps.edu/ sanner/python/&lt;br /&gt;
* Coot: http://www.ysbl.york.ac.uk/ emsley/coot/&lt;br /&gt;
* CCP4mg: http://www.ysbl.york.ac.uk/ lizp/molgraphics.html&lt;br /&gt;
* mmLib: http://pymmlib.sourceforge.net/ &lt;br /&gt;
* VMD: http://www.ks.uiuc.edu/Research/vmd/&lt;br /&gt;
* MMTK: http://starship.python.net/crew/hinsen/MMTK/&lt;br /&gt;
&lt;br /&gt;
I'd be crazy to write another molecular graphics application (been&lt;br /&gt;
there - done that, actually :-).&lt;br /&gt;
&lt;br /&gt;
=== Input/output ===&lt;br /&gt;
&lt;br /&gt;
==== How do I create a structure object from a PDB file? ====&lt;br /&gt;
&lt;br /&gt;
First, create a &amp;lt;code&amp;gt;PDBParser&amp;lt;/code&amp;gt; object:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
parser = PDBParser()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then, create a structure object from a PDB file in the following way&lt;br /&gt;
(the PDB file in this case is called '1FAT.pdb', 'PHA-L' is a user&lt;br /&gt;
defined name for the structure):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
structure = parser.get_structure('PHA-L', '1FAT.pdb')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I create a structure object from an mmCIF file? ====&lt;br /&gt;
&lt;br /&gt;
Similarly to the case the case of PDB files, first create an &amp;lt;code&amp;gt;MMCIFParser&amp;lt;/code&amp;gt;&lt;br /&gt;
object:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
parser = MMCIFParser()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Then use this parser to create a structure object from the mmCIF file:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
structure = parser.get_structure('PHA-L', '1FAT.cif')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== ...and what about the new PDB XML format? ====&lt;br /&gt;
&lt;br /&gt;
That's not yet supported, but I'm definitely planning to support that&lt;br /&gt;
in the future (it's not a lot of work). Contact me if you need this,&lt;br /&gt;
it might encourage me :-).&lt;br /&gt;
&lt;br /&gt;
==== I'd like to have some more low level access to an mmCIF file... ====&lt;br /&gt;
&lt;br /&gt;
You got it. You can create a python dictionary that maps all mmCIF&lt;br /&gt;
tags in an mmCIF file to their values. If there are multiple values&lt;br /&gt;
(like in the case of tag &amp;lt;code&amp;gt;_atom_site.Cartn_y&amp;lt;/code&amp;gt;, which holds&lt;br /&gt;
the ''y'' coordinates of all atoms), the tag is mapped to a list of values.&lt;br /&gt;
The dictionary is created from the mmCIF file as follows:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
mmcif_dict = MMCIF2Dict('1FAT.cif')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example: get the solvent content from an mmCIF file:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
sc = mmcif_dict['_exptl_crystal.density_percent_sol']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example: get the list of the ''y'' coordinates of all atoms&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
y_list = mmcif_dict['_atom_site.Cartn_y']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Can I access the header information? ====&lt;br /&gt;
&lt;br /&gt;
Thanks to Christian Rother you can access some information from the&lt;br /&gt;
PDB header. Note however that many PDB files contain headers with&lt;br /&gt;
incomplete or erroneous information. Many of the errors have been&lt;br /&gt;
fixed in the equivalent mmCIF files.&lt;br /&gt;
'''Hence, if you are interested in the header information, it is a good idea to extract information from mmCIF files using the &amp;lt;code&amp;gt;MMCIF2Dict&amp;lt;/code&amp;gt; tool described above, instead of parsing the PDB header.''' &lt;br /&gt;
&lt;br /&gt;
Now that is clarified, let's return to parsing the PDB header. The&lt;br /&gt;
structure object has an attribute called &amp;lt;code&amp;gt;header&amp;lt;/code&amp;gt; which is&lt;br /&gt;
a python dictionary that maps header records to their values.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
resolution = structure.header['resolution']&lt;br /&gt;
keywords = structure.header['keywords']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The available keys are &amp;lt;code&amp;gt;name&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;head&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;deposition_date&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;release_date&amp;lt;/code&amp;gt;,&lt;br /&gt;
&amp;lt;code&amp;gt;structure_method&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;resolution&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;structure_reference&amp;lt;/code&amp;gt; (maps to&lt;br /&gt;
a list of references), &amp;lt;code&amp;gt;journal_reference&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;author&amp;lt;/code&amp;gt; and&lt;br /&gt;
&amp;lt;code&amp;gt;compound&amp;lt;/code&amp;gt; (maps to a dictionary with various information about&lt;br /&gt;
the crystallized compound).&lt;br /&gt;
&lt;br /&gt;
The dictionary can also be created without creating a &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt;&lt;br /&gt;
object, ie. directly from the PDB file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
handle = open(filename,'r')&lt;br /&gt;
header_dict = parse_pdb_header(handle)&lt;br /&gt;
handle.close()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Can I use Bio.PDB with NMR structures (ie. with more than one model)? ====&lt;br /&gt;
&lt;br /&gt;
Sure. Many PDB parsers assume that there is only one model, making&lt;br /&gt;
them all but useless for NMR structures. The design of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt;&lt;br /&gt;
object makes it easy to handle PDB files with more than one model&lt;br /&gt;
(see section [[#The Structure object]]). &lt;br /&gt;
&lt;br /&gt;
==== How do I download structures from the PDB? ====&lt;br /&gt;
&lt;br /&gt;
This can be done using the &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object, using the &amp;lt;code&amp;gt;retrieve_pdb_file&amp;lt;/code&amp;gt;&lt;br /&gt;
method. The argument for this method is the PDB identifier of the&lt;br /&gt;
structure.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
pdbl = PDBList()&lt;br /&gt;
pdbl.retrieve_pdb_file('1FAT')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; class can also be used as a command-line tool: &lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
python PDBList.py 1fat&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
The downloaded file will be called &amp;lt;code&amp;gt;pdb1fat.ent&amp;lt;/code&amp;gt; and stored&lt;br /&gt;
in the current working directory. Note that the &amp;lt;code&amp;gt;retrieve_pdb_file&amp;lt;/code&amp;gt;&lt;br /&gt;
method also has an optional argument &amp;lt;code&amp;gt;pdir&amp;lt;/code&amp;gt; that specifies&lt;br /&gt;
a specific directory in which to store the downloaded PDB files. &lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;retrieve_pdb_file&amp;lt;/code&amp;gt; method also has some options for specifying&lt;br /&gt;
the compression format used for the download, and the program used&lt;br /&gt;
for local decompression (default &amp;lt;code&amp;gt;.Z&amp;lt;/code&amp;gt; format and &amp;lt;code&amp;gt;gunzip&amp;lt;/code&amp;gt;).&lt;br /&gt;
In addition, the PDB ftp site can be specified upon creation of the&lt;br /&gt;
&amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object. By default, the server of the Worldwide Protein Data Bank (\url{ftp://ftp.wwpdb.org/pub/pdb/data/structures/divided/pdb/})&lt;br /&gt;
is used. See the API documentation for more details. Thanks again&lt;br /&gt;
to Kristian Rother for donating this module.&lt;br /&gt;
&lt;br /&gt;
==== How do I download the entire PDB? ====&lt;br /&gt;
&lt;br /&gt;
The following commands will store all PDB files in the &amp;lt;code&amp;gt;/data/pdb&amp;lt;/code&amp;gt;&lt;br /&gt;
directory: &lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
python PDBList.py all /data/pdb&lt;br /&gt;
python PDBList.py all /data/pdb -d&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
The API method for this is called &amp;lt;code&amp;gt;download_entire_pdb&amp;lt;/code&amp;gt;.&lt;br /&gt;
Adding the &amp;lt;code&amp;gt;-d&amp;lt;/code&amp;gt; option will store all files in the same directory.&lt;br /&gt;
Otherwise, they are sorted into PDB-style subdirectories according&lt;br /&gt;
to their PDB ID's. Depending on the traffic, a complete download will&lt;br /&gt;
take 2-4 days.&lt;br /&gt;
&lt;br /&gt;
==== How do I keep a local copy of the PDB up-to-date? ====&lt;br /&gt;
&lt;br /&gt;
This can also be done using the &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object. One simply&lt;br /&gt;
creates a &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object (specifying the directory where&lt;br /&gt;
the local copy of the PDB is present) and calls the &amp;lt;code&amp;gt;update_pdb&amp;lt;/code&amp;gt;&lt;br /&gt;
method:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
pl = PDBList(pdb='/data/pdb')&lt;br /&gt;
pl.update_pdb()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
One can of course make a weekly cronjob out of this to keep&lt;br /&gt;
the local copy automatically up-to-date. The PDB ftp site can also&lt;br /&gt;
be specified (see API documentation).&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; has some additional methods that can be of use. The&lt;br /&gt;
&amp;lt;code&amp;gt;get_all_obsolete&amp;lt;/code&amp;gt; method can be used to get a list of all&lt;br /&gt;
obsolete PDB entries. The &amp;lt;code&amp;gt;changed_this_week&amp;lt;/code&amp;gt; method can&lt;br /&gt;
be used to obtain the entries that were added, modified or obsoleted&lt;br /&gt;
during the current week. For more info on the possibilities of &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt;,&lt;br /&gt;
see the API documentation.&lt;br /&gt;
&lt;br /&gt;
==== What about all those buggy PDB files? ====&lt;br /&gt;
&lt;br /&gt;
It is well known that many PDB files contain semantic errors (I'm&lt;br /&gt;
not talking about the structures themselves know, but their representation&lt;br /&gt;
in PDB files). Bio.PDB tries to handle this in two ways. The PDBParser&lt;br /&gt;
object can behave in two ways: a restrictive way and a permissive&lt;br /&gt;
way (THIS IS NOW THE DEFAULT). The restrictive way used to be the&lt;br /&gt;
default, but people seemed to think that Bio.PDB 'crashed' due to&lt;br /&gt;
a bug (hah!), so I changed it. If you ever encounter a real bug, please&lt;br /&gt;
tell me immediately!&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Permissive parser&lt;br /&gt;
parser = PDBParser(PERMISSIVE=1)&lt;br /&gt;
parser = PDBParser()  # The same (default)&lt;br /&gt;
# Strict parser&lt;br /&gt;
strict_parser = PDBParser(PERMISSIVE=0)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
In the permissive state (DEFAULT), PDB files that obviously contain&lt;br /&gt;
errors are 'corrected' (i.e. some residues or atoms are left out).&lt;br /&gt;
These errors include:&lt;br /&gt;
* Multiple residues with the same identifier&lt;br /&gt;
* Multiple atoms with the same identifier (taking into account the altloc identifier)&lt;br /&gt;
These errors indicate real problems in the PDB file (for details see&lt;br /&gt;
the Bioinformatics article). In the restrictive state, PDB files with&lt;br /&gt;
errors cause an exception to occur. This is useful to find errors&lt;br /&gt;
in PDB files.&lt;br /&gt;
&lt;br /&gt;
Some errors however are automatically corrected. Normally each disordered&lt;br /&gt;
atom should have a non-blank altloc identifier. However, there are&lt;br /&gt;
many structures that do not follow this convention, and have a blank&lt;br /&gt;
and a non-blank identifier for two disordered positions of the same&lt;br /&gt;
atom. This is automatically interpreted in the right way.&lt;br /&gt;
&lt;br /&gt;
Sometimes a structure contains a list of residues belonging to chain&lt;br /&gt;
A, followed by residues belonging to chain B, and again followed by&lt;br /&gt;
residues belonging to chain A, i.e. the chains are 'broken'. This&lt;br /&gt;
is also correctly interpreted.&lt;br /&gt;
&lt;br /&gt;
==== Can I write PDB files? ====&lt;br /&gt;
&lt;br /&gt;
Use the PDBIO class for this. It's easy to write out specific parts&lt;br /&gt;
of a structure too, of course.&lt;br /&gt;
&lt;br /&gt;
Example: saving a structure&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
io = PDBIO()&lt;br /&gt;
io.set_structure(s)&lt;br /&gt;
io.save('out.pdb')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
If you want to write out a part of the structure, make use of the&lt;br /&gt;
&amp;lt;code&amp;gt;Select&amp;lt;/code&amp;gt; class (also in &amp;lt;code&amp;gt;PDBIO&amp;lt;/code&amp;gt;). Select has four methods:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
accept_model(model)&lt;br /&gt;
accept_chain(chain)&lt;br /&gt;
accept_residue(residue)&lt;br /&gt;
accept_atom(atom)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
By default, every method returns 1 (which means the model/chain/residue/atom&lt;br /&gt;
is included in the output). By subclassing &amp;lt;code&amp;gt;Select&amp;lt;/code&amp;gt; and returning&lt;br /&gt;
0 when appropriate you can exclude models, chains, etc. from the output.&lt;br /&gt;
Cumbersome maybe, but very powerful. The following code only writes&lt;br /&gt;
out glycine residues:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
class GlySelect(Select):&lt;br /&gt;
    def accept_residue(self, residue):&lt;br /&gt;
        if residue.get_name()=='GLY':&lt;br /&gt;
            return 1&lt;br /&gt;
        else:&lt;br /&gt;
            return 0&lt;br /&gt;
&lt;br /&gt;
io = PDBIO()&lt;br /&gt;
io.set_structure(s)&lt;br /&gt;
io.save('gly_only.pdb', GlySelect())&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
If this is all too complicated for you, the &amp;lt;code&amp;gt;Dice&amp;lt;/code&amp;gt; module contains&lt;br /&gt;
a handy &amp;lt;code&amp;gt;extract&amp;lt;/code&amp;gt; function that writes out all residues in&lt;br /&gt;
a chain between a start and end residue.&lt;br /&gt;
&lt;br /&gt;
==== Can I write mmCIF files? ====&lt;br /&gt;
&lt;br /&gt;
No, and I also don't have plans to add that functionality soon (or&lt;br /&gt;
ever - I don't need it at all, and it's a lot of work, plus no-one&lt;br /&gt;
has ever asked for it). People who want to add this can contact me.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== The Structure object ===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==== What's the overall layout of a Structure object? ====&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object follows the so-called '''SMCRA'''&lt;br /&gt;
(Structure/Model/Chain/Residue/Atom) architecture : &lt;br /&gt;
&lt;br /&gt;
\begin{itemize&lt;br /&gt;
\item A structure consists of models&lt;br /&gt;
\item A model consists of chains&lt;br /&gt;
\item A chain consists of residues&lt;br /&gt;
\item A residue consists of atoms&lt;br /&gt;
\end{itemize&lt;br /&gt;
This is the way many structural biologists/bioinformaticians think&lt;br /&gt;
about structure, and provides a simple but efficient way to deal with&lt;br /&gt;
structure. Additional stuff is essentially added when needed. A UML&lt;br /&gt;
diagram of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object (forget about the &amp;lt;code&amp;gt;Disordered&amp;lt;/code&amp;gt;&lt;br /&gt;
classes for now) is shown in Fig. \ref{cap:SMCRA.&lt;br /&gt;
&lt;br /&gt;
%&lt;br /&gt;
\begin{figure[tbh]&lt;br /&gt;
\begin{center\includegraphics[%&lt;br /&gt;
  width=100mm,&lt;br /&gt;
  keepaspectratio]{images/smcra.png\end{center&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
\caption{\label{cap:SMCRAUML diagram of SMCRA architecture of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt;&lt;br /&gt;
object. Full lines with diamonds denote aggregation, full lines with&lt;br /&gt;
arrows denote referencing, full lines with triangles denote inheritance&lt;br /&gt;
and dashed lines with triangles denote interface realization. &lt;br /&gt;
\end{figure&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==== How do I navigate through a Structure object? ====&lt;br /&gt;
&lt;br /&gt;
The following code iterates through all atoms of a structure:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
p = PDBParser()&lt;br /&gt;
structure = p.get_structure('X', 'pdb1fat.ent')&lt;br /&gt;
for model in structure:&lt;br /&gt;
    for chain in model:&lt;br /&gt;
        for residue in chain:&lt;br /&gt;
            for atom in residue:&lt;br /&gt;
                print atom&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
There are also some shortcuts:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Iterate over all atoms in a structure&lt;br /&gt;
for atom in structure.get_atoms():&lt;br /&gt;
    print atom&lt;br /&gt;
&lt;br /&gt;
# Iterate over all residues in a model&lt;br /&gt;
for residue in model.get_residues():&lt;br /&gt;
    print residue&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Structures, models, chains, residues and atoms are called '''Entities'''&lt;br /&gt;
in Biopython. You can always get a parent &amp;lt;code&amp;gt;Entity&amp;lt;/code&amp;gt; from a child&lt;br /&gt;
&amp;lt;code&amp;gt;Entity&amp;lt;/code&amp;gt;, e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
residue = atom.get_parent()&lt;br /&gt;
chain = residue.get_parent()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
You can also test whether an &amp;lt;code&amp;gt;Entity&amp;lt;/code&amp;gt; has a certain child using&lt;br /&gt;
the &amp;lt;code&amp;gt;has_id&amp;lt;/code&amp;gt; method.&lt;br /&gt;
&lt;br /&gt;
==== Can I do that a bit more conveniently? ====&lt;br /&gt;
&lt;br /&gt;
You can do things like:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
atoms = structure.get_atoms()&lt;br /&gt;
residues = structure.get_residues()&lt;br /&gt;
atoms = chain.get_atoms()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
You can also use the &amp;lt;code&amp;gt;Selection.unfold_entities&amp;lt;/code&amp;gt; function:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Get all residues from a structure&lt;br /&gt;
res_list = Selection.unfold_entities(structure, 'R')&lt;br /&gt;
# Get all atoms from a chain&lt;br /&gt;
atom_list = Selection.unfold_entities(chain, 'A')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Obviously, &amp;lt;code&amp;gt;A&amp;lt;/code&amp;gt;=atom, &amp;lt;code&amp;gt;R&amp;lt;/code&amp;gt;=residue, &amp;lt;code&amp;gt;C&amp;lt;/code&amp;gt;=chain, &amp;lt;code&amp;gt;M&amp;lt;/code&amp;gt;=model, &amp;lt;code&amp;gt;S&amp;lt;/code&amp;gt;=structure.&lt;br /&gt;
You can use this to go up in the hierarchy, e.g. to get a list of (unique) &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;Chain&amp;lt;/code&amp;gt; parents from a list of&lt;br /&gt;
&amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;s:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
residue_list = Selection.unfold_entities(atom_list, 'R')&lt;br /&gt;
chain_list = Selection.unfold_entities(atom_list, 'C')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
For more info, see the API documentation.&lt;br /&gt;
&lt;br /&gt;
==== How do I extract a specific &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;Chain&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;Model&amp;lt;/code&amp;gt; from a &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt;? ====&lt;br /&gt;
&lt;br /&gt;
Easy. Here are some examples:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
model = structure[0]&lt;br /&gt;
chain = model['A']&lt;br /&gt;
residue = chain[100]&lt;br /&gt;
atom = residue['CA']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Note that you can use a shortcut:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
atom = structure[0]['A'][100]['CA']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== What is a model id? ====&lt;br /&gt;
&lt;br /&gt;
The model id is an integer which denotes the rank of the model in&lt;br /&gt;
the PDB/mmCIF file. The model is starts at 0. Crystal structures generally&lt;br /&gt;
have only one model (with id 0), while NMR files usually have several&lt;br /&gt;
models.&lt;br /&gt;
&lt;br /&gt;
==== What is a chain id? ====&lt;br /&gt;
&lt;br /&gt;
The chain id is specified in the PDB/mmCIF file, and is a single character&lt;br /&gt;
(typically a letter). &lt;br /&gt;
&lt;br /&gt;
==== What is a residue id? ====&lt;br /&gt;
&lt;br /&gt;
This is a bit more complicated, due to the clumsy PDB format. A residue&lt;br /&gt;
id is a tuple with three elements:&lt;br /&gt;
* The '''hetero-flag''': this is &amp;lt;code&amp;gt;'H_'&amp;lt;/code&amp;gt; plus the name of the hetero-residue (e.g. &amp;lt;code&amp;gt;'H_GLC'&amp;lt;/code&amp;gt; in the case of a glucose molecule), or &amp;lt;code&amp;gt;'W'&amp;lt;/code&amp;gt; in the case of a water molecule.&lt;br /&gt;
* The '''sequence identifier''' in the chain, e.g. 100&lt;br /&gt;
* The '''insertion code''', e.g. &amp;lt;code&amp;gt;'A'&amp;lt;/code&amp;gt;. The insertion code is sometimes used to preserve a certain desirable residue numbering scheme. A Ser 80 insertion mutant (inserted e.g. between a Thr 80 and an Asn 81 residue) could e.g. have sequence identifiers and insertion codes as follows: Thr 80 A, Ser 80 B, Asn 81. In this way the residue numbering scheme stays in tune with that of the wild type structure.&lt;br /&gt;
&lt;br /&gt;
The id of the above glucose residue would thus be &amp;lt;code&amp;gt;('H_GLC', 100, 'A')&amp;lt;/code&amp;gt;. If the hetero-flag and insertion code are blank, the sequence&lt;br /&gt;
identifier alone can be used:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Full id&lt;br /&gt;
residue = chain[(' ', 100, ' ')]&lt;br /&gt;
# Shortcut id&lt;br /&gt;
residue = chain[100]&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
The reason for the hetero-flag is that many, many PDB files use the same sequence identifier for an amino acid and a hetero-residue or a water, which would create obvious problems if the hetero-flag was not used.&lt;br /&gt;
&lt;br /&gt;
==== What is an atom id? ====&lt;br /&gt;
&lt;br /&gt;
The atom id is simply the atom name (eg. &amp;lt;code&amp;gt;'CA'&amp;lt;/code&amp;gt;). In practice,&lt;br /&gt;
the atom name is created by stripping all spaces from the atom name&lt;br /&gt;
in the PDB file. &lt;br /&gt;
&lt;br /&gt;
However, in PDB files, a space can be part of an atom name. Often,&lt;br /&gt;
calcium atoms are called &amp;lt;code&amp;gt;'CA..'&amp;lt;/code&amp;gt; in order to distinguish them&lt;br /&gt;
from C&amp;amp;alpha; atoms (which are called &amp;lt;code&amp;gt;'.CA.'&amp;lt;/code&amp;gt;). In cases&lt;br /&gt;
were stripping the spaces would create problems (ie. two atoms called&lt;br /&gt;
&amp;lt;code&amp;gt;'CA'&amp;lt;/code&amp;gt; in the same residue) the spaces are kept.&lt;br /&gt;
&lt;br /&gt;
==== How is disorder handled? ====&lt;br /&gt;
&lt;br /&gt;
This is one of the strong points of Bio.PDB. It can handle both disordered&lt;br /&gt;
atoms and point mutations (ie. a Gly and an Ala residue in the same&lt;br /&gt;
position). &lt;br /&gt;
&lt;br /&gt;
Disorder should be dealt with from two points of view: the atom and&lt;br /&gt;
the residue points of view. In general, I have tried to encapsulate&lt;br /&gt;
all the complexity that arises from disorder. If you just want to&lt;br /&gt;
loop over all C&amp;amp;alpha; atoms, you do not care that some residues&lt;br /&gt;
have a disordered side chain. On the other hand it should also be&lt;br /&gt;
possible to represent disorder completely in the data structure. Therefore,&lt;br /&gt;
disordered atoms or residues are stored in special objects that behave&lt;br /&gt;
as if there is no disorder. This is done by only representing a subset&lt;br /&gt;
of the disordered atoms or residues. Which subset is picked (e.g.&lt;br /&gt;
which of the two disordered OG side chain atom positions of a Ser&lt;br /&gt;
residue is used) can be specified by the user.&lt;br /&gt;
&lt;br /&gt;
'''Disordered atom positions''' are represented by ordinary &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;&lt;br /&gt;
objects, but all &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects that represent the same physical&lt;br /&gt;
atom are stored in a &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; object (see Fig. \ref{cap:SMCRA).&lt;br /&gt;
Each &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object in a &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; object can&lt;br /&gt;
be uniquely indexed using its altloc specifier. The &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt;&lt;br /&gt;
object forwards all uncaught method calls to the selected Atom object,&lt;br /&gt;
by default the one that represents the atom with the highest&lt;br /&gt;
occupancy. The user can of course change the selected &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;&lt;br /&gt;
object, making use of its altloc specifier. In this way atom disorder&lt;br /&gt;
is represented correctly without much additional complexity. In other&lt;br /&gt;
words, if you are not interested in atom disorder, you will not be&lt;br /&gt;
bothered by it.&lt;br /&gt;
&lt;br /&gt;
Each disordered atom has a characteristic altloc identifier. You can&lt;br /&gt;
specify that a &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; object should behave like&lt;br /&gt;
the &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object associated with a specific altloc identifier:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
atom.disordered_select('A')  # select altloc A atom&lt;br /&gt;
atom.disordered_select('B')  # select altloc B atom &lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
A special case arises when disorder is due to '''point mutations''',&lt;br /&gt;
i.e. when two or more point mutants of a polypeptide are present in&lt;br /&gt;
the crystal. An example of this can be found in PDB structure 1EN2.&lt;br /&gt;
&lt;br /&gt;
Since these residues belong to a different residue type (e.g. let's&lt;br /&gt;
say Ser 60 and Cys 60) they should not be stored in a single &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt;&lt;br /&gt;
object as in the common case. In this case, each residue is represented&lt;br /&gt;
by one &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object, and both &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; objects&lt;br /&gt;
are stored in a single &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt; object (see Fig.&lt;br /&gt;
\ref{cap:SMCRA).&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt; object forwards all uncaught&lt;br /&gt;
methods to the selected &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object (by default the last&lt;br /&gt;
&amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object added), and thus behaves like an ordinary&lt;br /&gt;
residue. Each &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object in a &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt;&lt;br /&gt;
object can be uniquely identified by its residue name. In the above&lt;br /&gt;
example, residue Ser 60 would have id 'SER' in the &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt;&lt;br /&gt;
object, while residue Cys 60 would have id 'CYS'. The user can select&lt;br /&gt;
the active &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object in a &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt;&lt;br /&gt;
object via this id.&lt;br /&gt;
&lt;br /&gt;
Example: suppose that a chain has a point mutation at position 10,&lt;br /&gt;
consisting of a Ser and a Cys residue. Make sure that residue 10 of&lt;br /&gt;
this chain behaves as the Cys residue.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
residue = chain[10]&lt;br /&gt;
residue.disordered_select('CYS')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
In addition, you can get a list of all &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects (ie.&lt;br /&gt;
all &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; objects are 'unpacked' to their individual&lt;br /&gt;
&amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects) using the &amp;lt;code&amp;gt;get_unpacked_list&amp;lt;/code&amp;gt; method&lt;br /&gt;
of a (&amp;lt;code&amp;gt;Disordered&amp;lt;/code&amp;gt;)&amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object.&lt;br /&gt;
&lt;br /&gt;
==== Can I sort residues in a chain somehow? ====&lt;br /&gt;
&lt;br /&gt;
Yes, kinda, but I'm waiting for a request for this feature to finish&lt;br /&gt;
it :-).&lt;br /&gt;
&lt;br /&gt;
==== How are ligands and solvent handled? ====&lt;br /&gt;
&lt;br /&gt;
See 'What is a residue id?'.&lt;br /&gt;
&lt;br /&gt;
==== What about B factors? ====&lt;br /&gt;
&lt;br /&gt;
Well, yes! Bio.PDB supports isotropic and anisotropic B factors, and&lt;br /&gt;
also deals with standard deviations of anisotropic B factor if present&lt;br /&gt;
(see the section [[#Analysis]]).&lt;br /&gt;
&lt;br /&gt;
==== What about standard deviation of atomic positions? ====&lt;br /&gt;
&lt;br /&gt;
Yup, supported. See the section [[#Analysis]].&lt;br /&gt;
&lt;br /&gt;
==== I think the SMCRA data structure is not flexible/sexy/whatever enough... ====&lt;br /&gt;
&lt;br /&gt;
Sure, sure. Everybody is always coming up with (mostly vaporware or&lt;br /&gt;
partly implemented) data structures that handle all possible situations&lt;br /&gt;
and are extensible in all thinkable (and unthinkable) ways. The prosaic&lt;br /&gt;
truth however is that 99.9\% of people using (and I mean really using!)&lt;br /&gt;
crystal structures think in terms of models, chains, residues and&lt;br /&gt;
atoms. The philosophy of Bio.PDB is to provide a reasonably fast,&lt;br /&gt;
clean, simple, but complete data structure to access structure data.&lt;br /&gt;
The proof of the pudding is in the eating.&lt;br /&gt;
&lt;br /&gt;
Moreover, it is quite easy to build more specialised data structures&lt;br /&gt;
on top of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; class (eg. there's a &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt;&lt;br /&gt;
class). On the other hand, the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object is built&lt;br /&gt;
using a Parser/Consumer approach (called &amp;lt;code&amp;gt;PDBParser&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;MMCIFParser&amp;lt;/code&amp;gt;&lt;br /&gt;
and &amp;lt;code&amp;gt;StructureBuilder&amp;lt;/code&amp;gt;, respectively). One can easily reuse&lt;br /&gt;
the PDB/mmCIF parsers by implementing a specialised &amp;lt;code&amp;gt;StructureBuilder&amp;lt;/code&amp;gt;&lt;br /&gt;
class. It is of course also trivial to add support for new file formats&lt;br /&gt;
by writing new parsers.&lt;br /&gt;
&lt;br /&gt;
=== Analysis ===&lt;br /&gt;
&lt;br /&gt;
==== How do I extract information from an &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object? ====&lt;br /&gt;
&lt;br /&gt;
Using the following methods:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
a.get_name()           # atom name (spaces stripped, e.g. 'CA')&lt;br /&gt;
a.get_id()             # id (equals atom name)&lt;br /&gt;
a.get_coord()          # atomic coordinates&lt;br /&gt;
a.get_vector()         # atomic coordinates as Vector object&lt;br /&gt;
a.get_bfactor()        # isotropic B factor&lt;br /&gt;
a.get_occupancy()      # occupancy&lt;br /&gt;
a.get_altloc()         # alternative location specifier&lt;br /&gt;
a.get_sigatm()         # std. dev. of atomic parameters&lt;br /&gt;
a.get_siguij()         # std. dev. of anisotropic B factor&lt;br /&gt;
a.get_anisou()         # anisotropic B factor&lt;br /&gt;
a.get_fullname()       # atom name (with spaces, e.g. '.CA.')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I extract information from a &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object? ====&lt;br /&gt;
&lt;br /&gt;
Using the following methods:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
r.get_resname()         # return the residue name (eg. 'GLY')&lt;br /&gt;
r.is_disordered()       # 1 if the residue has disordered atoms&lt;br /&gt;
r.get_segid()           # return the SEGID&lt;br /&gt;
r.has_id(name)          # test if a residue has a certain atom&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I measure distances? ====&lt;br /&gt;
&lt;br /&gt;
That's simple: the minus operator for atoms has been overloaded to&lt;br /&gt;
return the distance between two atoms. &lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Get some atoms&lt;br /&gt;
ca1 = residue1['CA']&lt;br /&gt;
ca2 = residue2['CA']&lt;br /&gt;
# Simply subtract the atoms to get their distance&lt;br /&gt;
distance = ca1-ca2&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I measure angles? ====&lt;br /&gt;
&lt;br /&gt;
This can easily be done via the vector representation of the atomic coordinates, and the &amp;lt;code&amp;gt;calc_angle&amp;lt;/code&amp;gt; function from the &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; module:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
vector1 = atom1.get_vector()&lt;br /&gt;
vector2 = atom2.get_vector()&lt;br /&gt;
vector3 = atom3.get_vector()&lt;br /&gt;
angle = calc_angle(vector1, vector2, vector3)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I measure torsion angles? ====&lt;br /&gt;
&lt;br /&gt;
Again, this can easily be done via the vector representation of the&lt;br /&gt;
atomic coordinates, this time using the &amp;lt;code&amp;gt;calc_dihedral&amp;lt;/code&amp;gt; function&lt;br /&gt;
from the &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; module:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
vector1=atom1.get_vector()&lt;br /&gt;
vector2=atom2.get_vector()&lt;br /&gt;
vector3=atom3.get_vector()&lt;br /&gt;
vector4=atom4.get_vector()&lt;br /&gt;
angle=calc_dihedral(vector1, vector2, vector3, vector4)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I determine atom-atom contacts? ====&lt;br /&gt;
&lt;br /&gt;
Use &amp;lt;code&amp;gt;NeighborSearch&amp;lt;/code&amp;gt;. This uses a KD tree data structure coded&lt;br /&gt;
in C behind the screens, so it's pretty darn fast (see &amp;lt;code&amp;gt;Bio.KDTree&amp;lt;/code&amp;gt;).&lt;br /&gt;
&lt;br /&gt;
==== How do I extract polypeptides from a &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object? ====&lt;br /&gt;
&lt;br /&gt;
Use &amp;lt;code&amp;gt;PolypeptideBuilder&amp;lt;/code&amp;gt;. You can use the resulting &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt;&lt;br /&gt;
object to get the sequence as a &amp;lt;code&amp;gt;Seq&amp;lt;/code&amp;gt; object or to get a list&lt;br /&gt;
of C&amp;amp;alpha; atoms as well. Polypeptides can be built using a C-N&lt;br /&gt;
or a C&amp;amp;alpha;-C&amp;amp;alpha; distance criterion.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Using C-N &lt;br /&gt;
ppb=PPBuilder()&lt;br /&gt;
for pp in ppb.build_peptides(structure): &lt;br /&gt;
    print pp.get_sequence()&lt;br /&gt;
&lt;br /&gt;
# Using CA-CA&lt;br /&gt;
ppb=CaPPBuilder()&lt;br /&gt;
for pp in ppb.build_peptides(structure): &lt;br /&gt;
    print pp.get_sequence()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that in the above case only model 0 of the structure is considered&lt;br /&gt;
by &amp;lt;code&amp;gt;PolypeptideBuilder&amp;lt;/code&amp;gt;. However, it is possible to use &amp;lt;code&amp;gt;PolypeptideBuilder&amp;lt;/code&amp;gt;&lt;br /&gt;
to build &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt; objects from &amp;lt;code&amp;gt;Model&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;Chain&amp;lt;/code&amp;gt;&lt;br /&gt;
objects as well.&lt;br /&gt;
&lt;br /&gt;
==== How do I get the sequence of a structure? ====&lt;br /&gt;
&lt;br /&gt;
The first thing to do is to extract all polypeptides from the structure&lt;br /&gt;
(see previous entry). The sequence of each polypeptide can then easily&lt;br /&gt;
be obtained from the &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt; objects. The sequence is&lt;br /&gt;
represented as a Biopython &amp;lt;code&amp;gt;Seq&amp;lt;/code&amp;gt; object, and its alphabet is&lt;br /&gt;
defined by a &amp;lt;code&amp;gt;ProteinAlphabet&amp;lt;/code&amp;gt; object.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; seq = polypeptide.get_sequence()&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print seq&lt;br /&gt;
Seq('SNVVE...', &amp;lt;class Bio.Alphabet.ProteinAlphabet&amp;gt;)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I determine secondary structure? ====&lt;br /&gt;
&lt;br /&gt;
For this functionality, you need to install DSSP (and obtain a license&lt;br /&gt;
for it - free for academic use, see http://www.cmbi.kun.nl/gv/dssp/).&lt;br /&gt;
Then use the &amp;lt;code&amp;gt;DSSP&amp;lt;/code&amp;gt; class, which maps &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; objects&lt;br /&gt;
to their secondary structure (and accessible surface area). The DSSP&lt;br /&gt;
codes are listed in Table \ref{cap:DSSP-codes. Note that DSSP (the&lt;br /&gt;
program, and thus by consequence the class) cannot handle multiple&lt;br /&gt;
models!&lt;br /&gt;
&lt;br /&gt;
%&lt;br /&gt;
\begin{table&lt;br /&gt;
&lt;br /&gt;
==== \begin{tabular{|c|c|&lt;br /&gt;
\hline &lt;br /&gt;
Code&amp;amp;&lt;br /&gt;
Secondary structure\tabularnewline&lt;br /&gt;
\hline&lt;br /&gt;
\hline &lt;br /&gt;
H&amp;amp;&lt;br /&gt;
&amp;amp;alpha;-helix\tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
B&amp;amp;&lt;br /&gt;
Isolated &amp;amp;beta;-bridge residue\tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
E&amp;amp;&lt;br /&gt;
Strand \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
G&amp;amp;&lt;br /&gt;
3-10 helix \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
I&amp;amp;&lt;br /&gt;
$\Pi$-helix \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
T&amp;amp;&lt;br /&gt;
Turn\tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
S&amp;amp;&lt;br /&gt;
Bend \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
-&amp;amp;&lt;br /&gt;
Other\tabularnewline&lt;br /&gt;
\hline&lt;br /&gt;
\end{tabular&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
\caption{\label{cap:DSSP-codesDSSP codes in Bio.PDB.&lt;br /&gt;
\end{table&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==== How do I calculate the accessible surface area of a residue? ====&lt;br /&gt;
&lt;br /&gt;
Use the &amp;lt;code&amp;gt;DSSP&amp;lt;/code&amp;gt; class (see also previous entry). But see also&lt;br /&gt;
next entry.&lt;br /&gt;
&lt;br /&gt;
==== How do I calculate residue depth? ====&lt;br /&gt;
&lt;br /&gt;
Residue depth is the average distance of a residue's atoms from the&lt;br /&gt;
solvent accessible surface. It's a fairly new and very powerful parameterization&lt;br /&gt;
of solvent accessibility. For this functionality, you need to install&lt;br /&gt;
Michel Sanner's MSMS program (http://www.scripps.edu/pub/olson-web/people/sanner/html/msms_home.html).&lt;br /&gt;
Then use the &amp;lt;code&amp;gt;ResidueDepth&amp;lt;/code&amp;gt; class. This class behaves as a&lt;br /&gt;
dictionary which maps &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; objects to corresponding (residue&lt;br /&gt;
depth, C&amp;amp;alpha; depth) tuples. The C&amp;amp;alpha; depth is the distance&lt;br /&gt;
of a residue's C&amp;amp;alpha; atom to the solvent accessible surface. &lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
model = structure[0]&lt;br /&gt;
rd = ResidueDepth(model, pdb_file)&lt;br /&gt;
residue_depth, ca_depth = rd[some_residue]&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
You can also get access to the molecular surface itself (via the &amp;lt;code&amp;gt;get_surface&amp;lt;/code&amp;gt;&lt;br /&gt;
function), in the form of a Numeric python array with the surface points.&lt;br /&gt;
&lt;br /&gt;
==== How do I calculate Half Sphere Exposure? ====&lt;br /&gt;
&lt;br /&gt;
Half Sphere Exposure (HSE) is a new, 2D measure of solvent exposure.&lt;br /&gt;
Basically, it counts the number of C&amp;amp;alpha; atoms around a residue&lt;br /&gt;
in the direction of its side chain, and in the opposite direction&lt;br /&gt;
(within a radius of 13 Å). Despite its simplicity, it outperforms&lt;br /&gt;
many other measures of solvent exposure. An article describing this&lt;br /&gt;
novel 2D measure has been submitted.&lt;br /&gt;
&lt;br /&gt;
HSE comes in two flavors: HSE&amp;amp;alpha; and HSE&amp;amp;beta;. The former&lt;br /&gt;
only uses the C&amp;amp;alpha; atom positions, while the latter uses the&lt;br /&gt;
C&amp;amp;alpha; and C&amp;amp;beta; atom positions. The HSE measure is calculated&lt;br /&gt;
by the &amp;lt;code&amp;gt;HSExposure&amp;lt;/code&amp;gt; class, which can also calculate the contact&lt;br /&gt;
number. The latter class has methods which return dictionaries that&lt;br /&gt;
map a &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object to its corresponding HSE&amp;amp;alpha;, HSE&amp;amp;beta;&lt;br /&gt;
and contact number values.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
model = structure[0]&lt;br /&gt;
hse = HSExposure()&lt;br /&gt;
# Calculate HSEalpha&lt;br /&gt;
exp_ca = hse.calc_hs_exposure(model, option='CA3')&lt;br /&gt;
# Calculate HSEbeta&lt;br /&gt;
exp_cb = hse.calc_hs_exposure(model, option='CB')&lt;br /&gt;
# Calculate classical coordination number&lt;br /&gt;
exp_fs = hse.calc_fs_exposure(model)&lt;br /&gt;
# Print HSEalpha for a residue&lt;br /&gt;
print exp_ca[some_residue]&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I map the residues of two related structures onto each other? ====&lt;br /&gt;
&lt;br /&gt;
First, create an alignment file in FASTA format, then use the &amp;lt;code&amp;gt;StructureAlignment&amp;lt;/code&amp;gt;&lt;br /&gt;
class. This class can also be used for alignments with more than two&lt;br /&gt;
structures.&lt;br /&gt;
&lt;br /&gt;
==== How do I test if a Residue object is an amino acid? ====&lt;br /&gt;
&lt;br /&gt;
Use &amp;lt;code&amp;gt;is_aa(residue)&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
==== Can I do vector operations on atomic coordinates? ====&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects return a &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; object representation&lt;br /&gt;
of the coordinates with the &amp;lt;code&amp;gt;get_vector&amp;lt;/code&amp;gt; method. &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt;&lt;br /&gt;
implements the full set of 3D vector operations, matrix multiplication&lt;br /&gt;
(left and right) and some advanced rotation-related operations as&lt;br /&gt;
well. See also next question.&lt;br /&gt;
&lt;br /&gt;
==== How do I put a virtual C&amp;amp;beta; on a Gly residue? ====&lt;br /&gt;
&lt;br /&gt;
OK, I admit, this example is only present to show off the possibilities&lt;br /&gt;
of Bio.PDB's &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; module (though this code is actually&lt;br /&gt;
used in the &amp;lt;code&amp;gt;HSExposure&amp;lt;/code&amp;gt; module, which contains a novel way&lt;br /&gt;
to parametrize residue exposure - publication underway). Suppose that&lt;br /&gt;
you would like to find the position of a Gly residue's C&amp;amp;beta; atom,&lt;br /&gt;
if it had one. How would you do that? Well, rotating the N atom of&lt;br /&gt;
the Gly residue along the C&amp;amp;alpha;-C bond over -120 degrees roughly&lt;br /&gt;
puts it in the position of a virtual C&amp;amp;beta; atom. Here's how to&lt;br /&gt;
do it, making use of the &amp;lt;code&amp;gt;rotaxis&amp;lt;/code&amp;gt; method (which can be used&lt;br /&gt;
to construct a rotation around a certain axis) of the &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt;&lt;br /&gt;
module:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# get atom coordinates as vectors&lt;br /&gt;
n = residue['N'].get_vector() &lt;br /&gt;
c = residue['C'].get_vector() &lt;br /&gt;
ca = residue['CA'].get_vector()&lt;br /&gt;
# center at origin&lt;br /&gt;
n = n - ca&lt;br /&gt;
c = c - ca&lt;br /&gt;
# find rotation matrix that rotates n -120 degrees along the ca-c vector&lt;br /&gt;
rot = rotaxis(-pi{*120.0/180.0, c)&lt;br /&gt;
# apply rotation to ca-n vector&lt;br /&gt;
cb_at_origin = n.left_multiply(rot)&lt;br /&gt;
# put on top of ca atom&lt;br /&gt;
cb = cb_at_origin + ca&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
This example shows that it's possible to do some quite nontrivial&lt;br /&gt;
vector operations on atomic data, which can be quite useful. In addition&lt;br /&gt;
to all the usual vector operations (cross (use &amp;lt;code&amp;gt;**&amp;lt;/code&amp;gt;), and&lt;br /&gt;
dot (use &amp;lt;code&amp;gt;*&amp;lt;/code&amp;gt;) product, angle, norm, etc.) and the above mentioned&lt;br /&gt;
&amp;lt;code&amp;gt;rotaxis function, the &amp;lt;code&amp;gt;Vector module also has methods&lt;br /&gt;
to rotate (&amp;lt;code&amp;gt;rotmat&amp;lt;/code&amp;gt;) or reflect (&amp;lt;code&amp;gt;refmat&amp;lt;/code&amp;gt;) one vector&lt;br /&gt;
on top of another.&lt;br /&gt;
&lt;br /&gt;
=== Manipulating the structure ===&lt;br /&gt;
&lt;br /&gt;
==== How do I superimpose two structures? ====&lt;br /&gt;
&lt;br /&gt;
Surprisingly, this is done using the &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; object.&lt;br /&gt;
This object calculates the rotation and translation matrix that rotates&lt;br /&gt;
two lists of atoms on top of each other in such a way that their RMSD&lt;br /&gt;
is minimized. Of course, the two lists need to contain the same amount&lt;br /&gt;
of atoms. The &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; object can also apply the rotation/translation&lt;br /&gt;
to a list of atoms. The rotation and translation are stored as a tuple&lt;br /&gt;
in the &amp;lt;code&amp;gt;rotran&amp;lt;/code&amp;gt; attribute of the &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; object&lt;br /&gt;
(note that the rotation is right multiplying!). The RMSD is stored&lt;br /&gt;
in the &amp;lt;code&amp;gt;rmsd&amp;lt;/code&amp;gt; attribute.&lt;br /&gt;
&lt;br /&gt;
The algorithm used by &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; comes from \textit{Matrix&lt;br /&gt;
computations, 2nd ed. Golub, G. \&amp;amp; Van Loan (1989) and makes use&lt;br /&gt;
of singular value decomposition (this is implemented in the general&lt;br /&gt;
&amp;lt;code&amp;gt;Bio.SVDSuperimposer&amp;lt;/code&amp;gt; module).&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
sup = Superimposer()&lt;br /&gt;
# Specify the atom lists&lt;br /&gt;
# 'fixed' and 'moving' are lists of Atom objects&lt;br /&gt;
# The moving atoms will be put on the fixed atoms&lt;br /&gt;
sup.set_atoms(fixed, moving)&lt;br /&gt;
# Print rotation/translation/rmsd&lt;br /&gt;
print sup.rotran&lt;br /&gt;
print sup.rms &lt;br /&gt;
# Apply rotation/translation to the moving atoms&lt;br /&gt;
sup.apply(moving)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I superimpose two structures based on their active sites? ====&lt;br /&gt;
&lt;br /&gt;
Pretty easily. Use the active site atoms to calculate the rotation/translation&lt;br /&gt;
matrices (see above), and apply these to the whole molecule.&lt;br /&gt;
&lt;br /&gt;
==== Can I manipulate the atomic coordinates? ====&lt;br /&gt;
&lt;br /&gt;
Yes, using the &amp;lt;code&amp;gt;transform&amp;lt;/code&amp;gt; method of the &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object,&lt;br /&gt;
or directly using the &amp;lt;code&amp;gt;set_coord&amp;lt;/code&amp;gt; method.&lt;br /&gt;
&lt;br /&gt;
== Other Structural Bioinformatics modules ==&lt;br /&gt;
&lt;br /&gt;
==== Bio.SCOP ====&lt;br /&gt;
&lt;br /&gt;
See the main Biopython tutorial.&lt;br /&gt;
&lt;br /&gt;
==== Bio.FSSP ====&lt;br /&gt;
&lt;br /&gt;
No documentation available yet.&lt;br /&gt;
&lt;br /&gt;
== You haven't answered my question yet! ==&lt;br /&gt;
&lt;br /&gt;
Woah! It's late and I'm tired, and a glass of excellent ''Pedro&lt;br /&gt;
Ximenez'' sherry is waiting for me. Just drop me a mail, and I'll answer&lt;br /&gt;
you in the morning (with a bit of luck...).&lt;br /&gt;
&lt;br /&gt;
== Contributors ==&lt;br /&gt;
&lt;br /&gt;
The main author/maintainer of Bio.PDB is:&lt;br /&gt;
&lt;br /&gt;
 Thomas Hamelryck&lt;br /&gt;
 Bioinformatics center&lt;br /&gt;
 Institute of Molecular Biology&lt;br /&gt;
 University of Copenhagen&lt;br /&gt;
 Universitetsparken 15, Bygning 10&lt;br /&gt;
 DK-2100 København Ø&lt;br /&gt;
 Denmark&lt;br /&gt;
 thamelry@binf.ku.dk&lt;br /&gt;
&lt;br /&gt;
Kristian Rother donated code to interact with the PDB database, and to parse the PDB&lt;br /&gt;
header. Indraneel Majumdar sent in some bug reports and assisted in&lt;br /&gt;
coding the &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt; module. Many thanks to Brad Chapman,&lt;br /&gt;
Jeffrey Chang, Andrew Dalke and Iddo Friedberg for suggestions, comments,&lt;br /&gt;
help and/or biting criticism :-).&lt;br /&gt;
&lt;br /&gt;
== Can I contribute? ==&lt;br /&gt;
&lt;br /&gt;
Yes, yes, yes! Just send an e-mail to [mailto:thamelry@binf.ku.dk Thomas Hamelryck] or to [mailto:biopython-dev@biopython.org the Biopython developers] if you have something useful to contribute! Eternal fame awaits!&lt;/div&gt;</summary>
		<author><name>Mdehoon</name></author>	</entry>

	<entry>
		<id>http://www.biopython.org/wiki/The_Biopython_Structural_Bioinformatics_FAQ</id>
		<title>The Biopython Structural Bioinformatics FAQ</title>
		<link rel="alternate" type="text/html" href="http://www.biopython.org/wiki/The_Biopython_Structural_Bioinformatics_FAQ"/>
				<updated>2013-03-26T03:34:22Z</updated>
		
		<summary type="html">&lt;p&gt;Mdehoon: /* How do I superimpose two structures? */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Introduction ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB is a biopython module that focuses on working with crystal&lt;br /&gt;
structures of biological macromolecules. This document gives a fairly&lt;br /&gt;
complete overview of Bio.PDB.&lt;br /&gt;
&lt;br /&gt;
== Bio.PDB's installation ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB is automatically installed as part of Biopython. Biopython&lt;br /&gt;
can be obtained from http://www.biopython.org. It runs on many&lt;br /&gt;
platforms (Linux/Unix, windows, Mac,...).&lt;br /&gt;
&lt;br /&gt;
== Who's using Bio.PDB? ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB was used in the construction of DISEMBL, a web server that&lt;br /&gt;
predicts disordered regions in proteins (http://dis.embl.de/),&lt;br /&gt;
and COLUMBA, a website that provides annotated protein structures&lt;br /&gt;
(http://www.columba-db.de/). Bio.PDB has also been used to&lt;br /&gt;
perform a large scale search for active sites similarities between&lt;br /&gt;
protein structures in the PDB (see [http://dx.doi.org/10.1002/prot.10338 \textit{Proteins Struct. Func.&lt;br /&gt;
Gen., \textbf{2003, 51, 96-108]) , and to develop a new algorithm&lt;br /&gt;
that identifies linear secondary structure elements ([\emph{BMC Bioinformatics,&lt;br /&gt;
\textbf{2005, 6, 202 http://www.biomedcentral.com/1471-2105/6/202]).&lt;br /&gt;
&lt;br /&gt;
Judging from requests for features and information, Bio.PDB is also&lt;br /&gt;
used by several LPCs (Large Pharmaceutical Companies :-).&lt;br /&gt;
&lt;br /&gt;
== Is there a Bio.PDB reference? ==&lt;br /&gt;
&lt;br /&gt;
Yes, and I'd appreciate it if you would refer to Bio.PDB in publications&lt;br /&gt;
if you make use of it. The reference is:&lt;br /&gt;
&lt;br /&gt;
\begin{quote&lt;br /&gt;
Hamelryck, T., Manderick, B. (2003) PDB parser and structure class&lt;br /&gt;
implemented in Python. \textit{Bioinformatics, \textbf{19, 2308-2310. &lt;br /&gt;
\end{quote&lt;br /&gt;
The article can be freely downloaded via the Bioinformatics journal&lt;br /&gt;
website (http://www.binf.ku.dk/users/thamelry/references.html).&lt;br /&gt;
I welcome e-mails telling me what you are using Bio.PDB for. Feature&lt;br /&gt;
requests are welcome too.&lt;br /&gt;
&lt;br /&gt;
== How well tested is Bio.PDB? ==&lt;br /&gt;
&lt;br /&gt;
Pretty well, actually. Bio.PDB has been extensively tested on nearly&lt;br /&gt;
5500 structures from the PDB - all structures seemed to be parsed&lt;br /&gt;
correctly. More details can be found in the Bio.PDB Bioinformatics&lt;br /&gt;
article. Bio.PDB has been used/is being used in many research projects&lt;br /&gt;
as a reliable tool. In fact, I'm using Bio.PDB almost daily for research&lt;br /&gt;
purposes and continue working on improving it and adding new features.&lt;br /&gt;
&lt;br /&gt;
== How fast is it? ==&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;PDBParser&amp;lt;/code&amp;gt; performance was tested on about 800 structures&lt;br /&gt;
(each belonging to a unique SCOP superfamily). This takes about 20&lt;br /&gt;
minutes, or on average 1.5 seconds per structure. Parsing the structure&lt;br /&gt;
of the large ribosomal subunit (1FKK), which contains about 64000&lt;br /&gt;
atoms, takes 10 seconds on a 1000 MHz PC. In short: it's more than&lt;br /&gt;
fast enough for many applications.&lt;br /&gt;
&lt;br /&gt;
== Why should I use Bio.PDB? ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB might be exactly what you want, and then again it might not.&lt;br /&gt;
If you are interested in data mining the PDB header, you might want&lt;br /&gt;
to look elsewhere because there is only limited support for this.&lt;br /&gt;
If you look for a powerful, complete data structure to access the&lt;br /&gt;
atomic data Bio.PDB is probably for you. &lt;br /&gt;
&lt;br /&gt;
== Usage ==&lt;br /&gt;
&lt;br /&gt;
=== General questions ===&lt;br /&gt;
&lt;br /&gt;
==== Importing Bio.PDB ====&lt;br /&gt;
&lt;br /&gt;
That's simple:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
from Bio.PDB import *&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Is there support for molecular graphics? ====&lt;br /&gt;
&lt;br /&gt;
Not directly, mostly since there are quite a few Python based/Python&lt;br /&gt;
aware solutions already, that can potentially be used with Bio.PDB.&lt;br /&gt;
My choice is Pymol, BTW (I've used this successfully with Bio.PDB,&lt;br /&gt;
and there will probably be specific PyMol modules in Bio.PDB soon/some&lt;br /&gt;
day). Python based/aware molecular graphics solutions include:&lt;br /&gt;
&lt;br /&gt;
* PyMol: http://pymol.sourceforge.net/&lt;br /&gt;
* Chimera: http://www.cgl.ucsf.edu/chimera/&lt;br /&gt;
* PMV: http://www.scripps.edu/ sanner/python/&lt;br /&gt;
* Coot: http://www.ysbl.york.ac.uk/ emsley/coot/&lt;br /&gt;
* CCP4mg: http://www.ysbl.york.ac.uk/ lizp/molgraphics.html&lt;br /&gt;
* mmLib: http://pymmlib.sourceforge.net/ &lt;br /&gt;
* VMD: http://www.ks.uiuc.edu/Research/vmd/&lt;br /&gt;
* MMTK: http://starship.python.net/crew/hinsen/MMTK/&lt;br /&gt;
&lt;br /&gt;
I'd be crazy to write another molecular graphics application (been&lt;br /&gt;
there - done that, actually :-).&lt;br /&gt;
&lt;br /&gt;
=== Input/output ===&lt;br /&gt;
&lt;br /&gt;
==== How do I create a structure object from a PDB file? ====&lt;br /&gt;
&lt;br /&gt;
First, create a &amp;lt;code&amp;gt;PDBParser&amp;lt;/code&amp;gt; object:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
parser = PDBParser()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then, create a structure object from a PDB file in the following way&lt;br /&gt;
(the PDB file in this case is called '1FAT.pdb', 'PHA-L' is a user&lt;br /&gt;
defined name for the structure):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
structure = parser.get_structure('PHA-L', '1FAT.pdb')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I create a structure object from an mmCIF file? ====&lt;br /&gt;
&lt;br /&gt;
Similarly to the case the case of PDB files, first create an &amp;lt;code&amp;gt;MMCIFParser&amp;lt;/code&amp;gt;&lt;br /&gt;
object:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
parser = MMCIFParser()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Then use this parser to create a structure object from the mmCIF file:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
structure = parser.get_structure('PHA-L', '1FAT.cif')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== ...and what about the new PDB XML format? ====&lt;br /&gt;
&lt;br /&gt;
That's not yet supported, but I'm definitely planning to support that&lt;br /&gt;
in the future (it's not a lot of work). Contact me if you need this,&lt;br /&gt;
it might encourage me :-).&lt;br /&gt;
&lt;br /&gt;
==== I'd like to have some more low level access to an mmCIF file... ====&lt;br /&gt;
&lt;br /&gt;
You got it. You can create a python dictionary that maps all mmCIF&lt;br /&gt;
tags in an mmCIF file to their values. If there are multiple values&lt;br /&gt;
(like in the case of tag &amp;lt;code&amp;gt;_atom_site.Cartn_y&amp;lt;/code&amp;gt;, which holds&lt;br /&gt;
the ''y'' coordinates of all atoms), the tag is mapped to a list of values.&lt;br /&gt;
The dictionary is created from the mmCIF file as follows:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
mmcif_dict = MMCIF2Dict('1FAT.cif')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example: get the solvent content from an mmCIF file:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
sc = mmcif_dict['_exptl_crystal.density_percent_sol']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example: get the list of the ''y'' coordinates of all atoms&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
y_list = mmcif_dict['_atom_site.Cartn_y']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Can I access the header information? ====&lt;br /&gt;
&lt;br /&gt;
Thanks to Christian Rother you can access some information from the&lt;br /&gt;
PDB header. Note however that many PDB files contain headers with&lt;br /&gt;
incomplete or erroneous information. Many of the errors have been&lt;br /&gt;
fixed in the equivalent mmCIF files.&lt;br /&gt;
'''Hence, if you are interested in the header information, it is a good idea to extract information from mmCIF files using the &amp;lt;code&amp;gt;MMCIF2Dict&amp;lt;/code&amp;gt; tool described above, instead of parsing the PDB header.''' &lt;br /&gt;
&lt;br /&gt;
Now that is clarified, let's return to parsing the PDB header. The&lt;br /&gt;
structure object has an attribute called &amp;lt;code&amp;gt;header&amp;lt;/code&amp;gt; which is&lt;br /&gt;
a python dictionary that maps header records to their values.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
resolution = structure.header['resolution']&lt;br /&gt;
keywords = structure.header['keywords']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The available keys are &amp;lt;code&amp;gt;name&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;head&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;deposition_date&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;release_date&amp;lt;/code&amp;gt;,&lt;br /&gt;
&amp;lt;code&amp;gt;structure_method&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;resolution&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;structure_reference&amp;lt;/code&amp;gt; (maps to&lt;br /&gt;
a list of references), &amp;lt;code&amp;gt;journal_reference&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;author&amp;lt;/code&amp;gt; and&lt;br /&gt;
&amp;lt;code&amp;gt;compound&amp;lt;/code&amp;gt; (maps to a dictionary with various information about&lt;br /&gt;
the crystallized compound).&lt;br /&gt;
&lt;br /&gt;
The dictionary can also be created without creating a &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt;&lt;br /&gt;
object, ie. directly from the PDB file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
handle = open(filename,'r')&lt;br /&gt;
header_dict = parse_pdb_header(handle)&lt;br /&gt;
handle.close()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Can I use Bio.PDB with NMR structures (ie. with more than one model)? ====&lt;br /&gt;
&lt;br /&gt;
Sure. Many PDB parsers assume that there is only one model, making&lt;br /&gt;
them all but useless for NMR structures. The design of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt;&lt;br /&gt;
object makes it easy to handle PDB files with more than one model&lt;br /&gt;
(see section [[#The Structure object]]). &lt;br /&gt;
&lt;br /&gt;
==== How do I download structures from the PDB? ====&lt;br /&gt;
&lt;br /&gt;
This can be done using the &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object, using the &amp;lt;code&amp;gt;retrieve_pdb_file&amp;lt;/code&amp;gt;&lt;br /&gt;
method. The argument for this method is the PDB identifier of the&lt;br /&gt;
structure.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
pdbl = PDBList()&lt;br /&gt;
pdbl.retrieve_pdb_file('1FAT')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; class can also be used as a command-line tool: &lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
python PDBList.py 1fat&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
The downloaded file will be called &amp;lt;code&amp;gt;pdb1fat.ent&amp;lt;/code&amp;gt; and stored&lt;br /&gt;
in the current working directory. Note that the &amp;lt;code&amp;gt;retrieve_pdb_file&amp;lt;/code&amp;gt;&lt;br /&gt;
method also has an optional argument &amp;lt;code&amp;gt;pdir&amp;lt;/code&amp;gt; that specifies&lt;br /&gt;
a specific directory in which to store the downloaded PDB files. &lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;retrieve_pdb_file&amp;lt;/code&amp;gt; method also has some options for specifying&lt;br /&gt;
the compression format used for the download, and the program used&lt;br /&gt;
for local decompression (default &amp;lt;code&amp;gt;.Z&amp;lt;/code&amp;gt; format and &amp;lt;code&amp;gt;gunzip&amp;lt;/code&amp;gt;).&lt;br /&gt;
In addition, the PDB ftp site can be specified upon creation of the&lt;br /&gt;
&amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object. By default, the server of the Worldwide Protein Data Bank (\url{ftp://ftp.wwpdb.org/pub/pdb/data/structures/divided/pdb/})&lt;br /&gt;
is used. See the API documentation for more details. Thanks again&lt;br /&gt;
to Kristian Rother for donating this module.&lt;br /&gt;
&lt;br /&gt;
==== How do I download the entire PDB? ====&lt;br /&gt;
&lt;br /&gt;
The following commands will store all PDB files in the &amp;lt;code&amp;gt;/data/pdb&amp;lt;/code&amp;gt;&lt;br /&gt;
directory: &lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
python PDBList.py all /data/pdb&lt;br /&gt;
python PDBList.py all /data/pdb -d&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
The API method for this is called &amp;lt;code&amp;gt;download_entire_pdb&amp;lt;/code&amp;gt;.&lt;br /&gt;
Adding the &amp;lt;code&amp;gt;-d&amp;lt;/code&amp;gt; option will store all files in the same directory.&lt;br /&gt;
Otherwise, they are sorted into PDB-style subdirectories according&lt;br /&gt;
to their PDB ID's. Depending on the traffic, a complete download will&lt;br /&gt;
take 2-4 days.&lt;br /&gt;
&lt;br /&gt;
==== How do I keep a local copy of the PDB up-to-date? ====&lt;br /&gt;
&lt;br /&gt;
This can also be done using the &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object. One simply&lt;br /&gt;
creates a &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object (specifying the directory where&lt;br /&gt;
the local copy of the PDB is present) and calls the &amp;lt;code&amp;gt;update_pdb&amp;lt;/code&amp;gt;&lt;br /&gt;
method:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
pl = PDBList(pdb='/data/pdb')&lt;br /&gt;
pl.update_pdb()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
One can of course make a weekly cronjob out of this to keep&lt;br /&gt;
the local copy automatically up-to-date. The PDB ftp site can also&lt;br /&gt;
be specified (see API documentation).&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; has some additional methods that can be of use. The&lt;br /&gt;
&amp;lt;code&amp;gt;get_all_obsolete&amp;lt;/code&amp;gt; method can be used to get a list of all&lt;br /&gt;
obsolete PDB entries. The &amp;lt;code&amp;gt;changed_this_week&amp;lt;/code&amp;gt; method can&lt;br /&gt;
be used to obtain the entries that were added, modified or obsoleted&lt;br /&gt;
during the current week. For more info on the possibilities of &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt;,&lt;br /&gt;
see the API documentation.&lt;br /&gt;
&lt;br /&gt;
==== What about all those buggy PDB files? ====&lt;br /&gt;
&lt;br /&gt;
It is well known that many PDB files contain semantic errors (I'm&lt;br /&gt;
not talking about the structures themselves know, but their representation&lt;br /&gt;
in PDB files). Bio.PDB tries to handle this in two ways. The PDBParser&lt;br /&gt;
object can behave in two ways: a restrictive way and a permissive&lt;br /&gt;
way (THIS IS NOW THE DEFAULT). The restrictive way used to be the&lt;br /&gt;
default, but people seemed to think that Bio.PDB 'crashed' due to&lt;br /&gt;
a bug (hah!), so I changed it. If you ever encounter a real bug, please&lt;br /&gt;
tell me immediately!&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Permissive parser&lt;br /&gt;
parser = PDBParser(PERMISSIVE=1)&lt;br /&gt;
parser = PDBParser()  # The same (default)&lt;br /&gt;
# Strict parser&lt;br /&gt;
strict_parser = PDBParser(PERMISSIVE=0)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
In the permissive state (DEFAULT), PDB files that obviously contain&lt;br /&gt;
errors are 'corrected' (i.e. some residues or atoms are left out).&lt;br /&gt;
These errors include:&lt;br /&gt;
* Multiple residues with the same identifier&lt;br /&gt;
* Multiple atoms with the same identifier (taking into account the altloc identifier)&lt;br /&gt;
These errors indicate real problems in the PDB file (for details see&lt;br /&gt;
the Bioinformatics article). In the restrictive state, PDB files with&lt;br /&gt;
errors cause an exception to occur. This is useful to find errors&lt;br /&gt;
in PDB files.&lt;br /&gt;
&lt;br /&gt;
Some errors however are automatically corrected. Normally each disordered&lt;br /&gt;
atom should have a non-blank altloc identifier. However, there are&lt;br /&gt;
many structures that do not follow this convention, and have a blank&lt;br /&gt;
and a non-blank identifier for two disordered positions of the same&lt;br /&gt;
atom. This is automatically interpreted in the right way.&lt;br /&gt;
&lt;br /&gt;
Sometimes a structure contains a list of residues belonging to chain&lt;br /&gt;
A, followed by residues belonging to chain B, and again followed by&lt;br /&gt;
residues belonging to chain A, i.e. the chains are 'broken'. This&lt;br /&gt;
is also correctly interpreted.&lt;br /&gt;
&lt;br /&gt;
==== Can I write PDB files? ====&lt;br /&gt;
&lt;br /&gt;
Use the PDBIO class for this. It's easy to write out specific parts&lt;br /&gt;
of a structure too, of course.&lt;br /&gt;
&lt;br /&gt;
Example: saving a structure&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
io = PDBIO()&lt;br /&gt;
io.set_structure(s)&lt;br /&gt;
io.save('out.pdb')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
If you want to write out a part of the structure, make use of the&lt;br /&gt;
&amp;lt;code&amp;gt;Select&amp;lt;/code&amp;gt; class (also in &amp;lt;code&amp;gt;PDBIO&amp;lt;/code&amp;gt;). Select has four methods:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
accept_model(model)&lt;br /&gt;
accept_chain(chain)&lt;br /&gt;
accept_residue(residue)&lt;br /&gt;
accept_atom(atom)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
By default, every method returns 1 (which means the model/chain/residue/atom&lt;br /&gt;
is included in the output). By subclassing &amp;lt;code&amp;gt;Select&amp;lt;/code&amp;gt; and returning&lt;br /&gt;
0 when appropriate you can exclude models, chains, etc. from the output.&lt;br /&gt;
Cumbersome maybe, but very powerful. The following code only writes&lt;br /&gt;
out glycine residues:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
class GlySelect(Select):&lt;br /&gt;
    def accept_residue(self, residue):&lt;br /&gt;
        if residue.get_name()=='GLY':&lt;br /&gt;
            return 1&lt;br /&gt;
        else:&lt;br /&gt;
            return 0&lt;br /&gt;
&lt;br /&gt;
io = PDBIO()&lt;br /&gt;
io.set_structure(s)&lt;br /&gt;
io.save('gly_only.pdb', GlySelect())&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
If this is all too complicated for you, the &amp;lt;code&amp;gt;Dice&amp;lt;/code&amp;gt; module contains&lt;br /&gt;
a handy &amp;lt;code&amp;gt;extract&amp;lt;/code&amp;gt; function that writes out all residues in&lt;br /&gt;
a chain between a start and end residue.&lt;br /&gt;
&lt;br /&gt;
==== Can I write mmCIF files? ====&lt;br /&gt;
&lt;br /&gt;
No, and I also don't have plans to add that functionality soon (or&lt;br /&gt;
ever - I don't need it at all, and it's a lot of work, plus no-one&lt;br /&gt;
has ever asked for it). People who want to add this can contact me.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== The Structure object ===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==== What's the overall layout of a Structure object? ====&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object follows the so-called '''SMCRA'''&lt;br /&gt;
(Structure/Model/Chain/Residue/Atom) architecture : &lt;br /&gt;
&lt;br /&gt;
\begin{itemize&lt;br /&gt;
\item A structure consists of models&lt;br /&gt;
\item A model consists of chains&lt;br /&gt;
\item A chain consists of residues&lt;br /&gt;
\item A residue consists of atoms&lt;br /&gt;
\end{itemize&lt;br /&gt;
This is the way many structural biologists/bioinformaticians think&lt;br /&gt;
about structure, and provides a simple but efficient way to deal with&lt;br /&gt;
structure. Additional stuff is essentially added when needed. A UML&lt;br /&gt;
diagram of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object (forget about the &amp;lt;code&amp;gt;Disordered&amp;lt;/code&amp;gt;&lt;br /&gt;
classes for now) is shown in Fig. \ref{cap:SMCRA.&lt;br /&gt;
&lt;br /&gt;
%&lt;br /&gt;
\begin{figure[tbh]&lt;br /&gt;
\begin{center\includegraphics[%&lt;br /&gt;
  width=100mm,&lt;br /&gt;
  keepaspectratio]{images/smcra.png\end{center&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
\caption{\label{cap:SMCRAUML diagram of SMCRA architecture of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt;&lt;br /&gt;
object. Full lines with diamonds denote aggregation, full lines with&lt;br /&gt;
arrows denote referencing, full lines with triangles denote inheritance&lt;br /&gt;
and dashed lines with triangles denote interface realization. &lt;br /&gt;
\end{figure&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==== How do I navigate through a Structure object? ====&lt;br /&gt;
&lt;br /&gt;
The following code iterates through all atoms of a structure:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
p = PDBParser()&lt;br /&gt;
structure = p.get_structure('X', 'pdb1fat.ent')&lt;br /&gt;
for model in structure:&lt;br /&gt;
    for chain in model:&lt;br /&gt;
        for residue in chain:&lt;br /&gt;
            for atom in residue:&lt;br /&gt;
                print atom&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
There are also some shortcuts:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Iterate over all atoms in a structure&lt;br /&gt;
for atom in structure.get_atoms():&lt;br /&gt;
    print atom&lt;br /&gt;
&lt;br /&gt;
# Iterate over all residues in a model&lt;br /&gt;
for residue in model.get_residues():&lt;br /&gt;
    print residue&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Structures, models, chains, residues and atoms are called '''Entities'''&lt;br /&gt;
in Biopython. You can always get a parent &amp;lt;code&amp;gt;Entity&amp;lt;/code&amp;gt; from a child&lt;br /&gt;
&amp;lt;code&amp;gt;Entity&amp;lt;/code&amp;gt;, e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
residue = atom.get_parent()&lt;br /&gt;
chain = residue.get_parent()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
You can also test whether an &amp;lt;code&amp;gt;Entity&amp;lt;/code&amp;gt; has a certain child using&lt;br /&gt;
the &amp;lt;code&amp;gt;has_id&amp;lt;/code&amp;gt; method.&lt;br /&gt;
&lt;br /&gt;
==== Can I do that a bit more conveniently? ====&lt;br /&gt;
&lt;br /&gt;
You can do things like:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
atoms = structure.get_atoms()&lt;br /&gt;
residues = structure.get_residues()&lt;br /&gt;
atoms = chain.get_atoms()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
You can also use the &amp;lt;code&amp;gt;Selection.unfold_entities&amp;lt;/code&amp;gt; function:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Get all residues from a structure&lt;br /&gt;
res_list = Selection.unfold_entities(structure, 'R')&lt;br /&gt;
# Get all atoms from a chain&lt;br /&gt;
atom_list = Selection.unfold_entities(chain, 'A')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Obviously, &amp;lt;code&amp;gt;A&amp;lt;/code&amp;gt;=atom, &amp;lt;code&amp;gt;R&amp;lt;/code&amp;gt;=residue, &amp;lt;code&amp;gt;C&amp;lt;/code&amp;gt;=chain, &amp;lt;code&amp;gt;M&amp;lt;/code&amp;gt;=model, &amp;lt;code&amp;gt;S&amp;lt;/code&amp;gt;=structure.&lt;br /&gt;
You can use this to go up in the hierarchy, e.g. to get a list of (unique) &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;Chain&amp;lt;/code&amp;gt; parents from a list of&lt;br /&gt;
&amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;s:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
residue_list = Selection.unfold_entities(atom_list, 'R')&lt;br /&gt;
chain_list = Selection.unfold_entities(atom_list, 'C')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
For more info, see the API documentation.&lt;br /&gt;
&lt;br /&gt;
==== How do I extract a specific &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;Chain&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;Model&amp;lt;/code&amp;gt; from a &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt;? ====&lt;br /&gt;
&lt;br /&gt;
Easy. Here are some examples:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
model = structure[0]&lt;br /&gt;
chain = model['A']&lt;br /&gt;
residue = chain[100]&lt;br /&gt;
atom = residue['CA']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Note that you can use a shortcut:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
atom = structure[0]['A'][100]['CA']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== What is a model id? ====&lt;br /&gt;
&lt;br /&gt;
The model id is an integer which denotes the rank of the model in&lt;br /&gt;
the PDB/mmCIF file. The model is starts at 0. Crystal structures generally&lt;br /&gt;
have only one model (with id 0), while NMR files usually have several&lt;br /&gt;
models.&lt;br /&gt;
&lt;br /&gt;
==== What is a chain id? ====&lt;br /&gt;
&lt;br /&gt;
The chain id is specified in the PDB/mmCIF file, and is a single character&lt;br /&gt;
(typically a letter). &lt;br /&gt;
&lt;br /&gt;
==== What is a residue id? ====&lt;br /&gt;
&lt;br /&gt;
This is a bit more complicated, due to the clumsy PDB format. A residue&lt;br /&gt;
id is a tuple with three elements:&lt;br /&gt;
* The '''hetero-flag''': this is &amp;lt;code&amp;gt;'H_'&amp;lt;/code&amp;gt; plus the name of the hetero-residue (e.g. &amp;lt;code&amp;gt;'H_GLC'&amp;lt;/code&amp;gt; in the case of a glucose molecule), or &amp;lt;code&amp;gt;'W'&amp;lt;/code&amp;gt; in the case of a water molecule.&lt;br /&gt;
* The '''sequence identifier''' in the chain, e.g. 100&lt;br /&gt;
* The '''insertion code''', e.g. &amp;lt;code&amp;gt;'A'&amp;lt;/code&amp;gt;. The insertion code is sometimes used to preserve a certain desirable residue numbering scheme. A Ser 80 insertion mutant (inserted e.g. between a Thr 80 and an Asn 81 residue) could e.g. have sequence identifiers and insertion codes as follows: Thr 80 A, Ser 80 B, Asn 81. In this way the residue numbering scheme stays in tune with that of the wild type structure.&lt;br /&gt;
&lt;br /&gt;
The id of the above glucose residue would thus be &amp;lt;code&amp;gt;('H_GLC', 100, 'A')&amp;lt;/code&amp;gt;. If the hetero-flag and insertion code are blank, the sequence&lt;br /&gt;
identifier alone can be used:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Full id&lt;br /&gt;
residue = chain[(' ', 100, ' ')]&lt;br /&gt;
# Shortcut id&lt;br /&gt;
residue = chain[100]&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
The reason for the hetero-flag is that many, many PDB files use the same sequence identifier for an amino acid and a hetero-residue or a water, which would create obvious problems if the hetero-flag was not used.&lt;br /&gt;
&lt;br /&gt;
==== What is an atom id? ====&lt;br /&gt;
&lt;br /&gt;
The atom id is simply the atom name (eg. &amp;lt;code&amp;gt;'CA'&amp;lt;/code&amp;gt;). In practice,&lt;br /&gt;
the atom name is created by stripping all spaces from the atom name&lt;br /&gt;
in the PDB file. &lt;br /&gt;
&lt;br /&gt;
However, in PDB files, a space can be part of an atom name. Often,&lt;br /&gt;
calcium atoms are called &amp;lt;code&amp;gt;'CA..'&amp;lt;/code&amp;gt; in order to distinguish them&lt;br /&gt;
from C&amp;amp;alpha; atoms (which are called &amp;lt;code&amp;gt;'.CA.'&amp;lt;/code&amp;gt;). In cases&lt;br /&gt;
were stripping the spaces would create problems (ie. two atoms called&lt;br /&gt;
&amp;lt;code&amp;gt;'CA'&amp;lt;/code&amp;gt; in the same residue) the spaces are kept.&lt;br /&gt;
&lt;br /&gt;
==== How is disorder handled? ====&lt;br /&gt;
&lt;br /&gt;
This is one of the strong points of Bio.PDB. It can handle both disordered&lt;br /&gt;
atoms and point mutations (ie. a Gly and an Ala residue in the same&lt;br /&gt;
position). &lt;br /&gt;
&lt;br /&gt;
Disorder should be dealt with from two points of view: the atom and&lt;br /&gt;
the residue points of view. In general, I have tried to encapsulate&lt;br /&gt;
all the complexity that arises from disorder. If you just want to&lt;br /&gt;
loop over all C&amp;amp;alpha; atoms, you do not care that some residues&lt;br /&gt;
have a disordered side chain. On the other hand it should also be&lt;br /&gt;
possible to represent disorder completely in the data structure. Therefore,&lt;br /&gt;
disordered atoms or residues are stored in special objects that behave&lt;br /&gt;
as if there is no disorder. This is done by only representing a subset&lt;br /&gt;
of the disordered atoms or residues. Which subset is picked (e.g.&lt;br /&gt;
which of the two disordered OG side chain atom positions of a Ser&lt;br /&gt;
residue is used) can be specified by the user.&lt;br /&gt;
&lt;br /&gt;
'''Disordered atom positions''' are represented by ordinary &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;&lt;br /&gt;
objects, but all &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects that represent the same physical&lt;br /&gt;
atom are stored in a &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; object (see Fig. \ref{cap:SMCRA).&lt;br /&gt;
Each &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object in a &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; object can&lt;br /&gt;
be uniquely indexed using its altloc specifier. The &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt;&lt;br /&gt;
object forwards all uncaught method calls to the selected Atom object,&lt;br /&gt;
by default the one that represents the atom with the highest&lt;br /&gt;
occupancy. The user can of course change the selected &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;&lt;br /&gt;
object, making use of its altloc specifier. In this way atom disorder&lt;br /&gt;
is represented correctly without much additional complexity. In other&lt;br /&gt;
words, if you are not interested in atom disorder, you will not be&lt;br /&gt;
bothered by it.&lt;br /&gt;
&lt;br /&gt;
Each disordered atom has a characteristic altloc identifier. You can&lt;br /&gt;
specify that a &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; object should behave like&lt;br /&gt;
the &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object associated with a specific altloc identifier:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
atom.disordered_select('A')  # select altloc A atom&lt;br /&gt;
atom.disordered_select('B')  # select altloc B atom &lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
A special case arises when disorder is due to '''point mutations''',&lt;br /&gt;
i.e. when two or more point mutants of a polypeptide are present in&lt;br /&gt;
the crystal. An example of this can be found in PDB structure 1EN2.&lt;br /&gt;
&lt;br /&gt;
Since these residues belong to a different residue type (e.g. let's&lt;br /&gt;
say Ser 60 and Cys 60) they should not be stored in a single &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt;&lt;br /&gt;
object as in the common case. In this case, each residue is represented&lt;br /&gt;
by one &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object, and both &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; objects&lt;br /&gt;
are stored in a single &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt; object (see Fig.&lt;br /&gt;
\ref{cap:SMCRA).&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt; object forwards all uncaught&lt;br /&gt;
methods to the selected &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object (by default the last&lt;br /&gt;
&amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object added), and thus behaves like an ordinary&lt;br /&gt;
residue. Each &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object in a &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt;&lt;br /&gt;
object can be uniquely identified by its residue name. In the above&lt;br /&gt;
example, residue Ser 60 would have id 'SER' in the &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt;&lt;br /&gt;
object, while residue Cys 60 would have id 'CYS'. The user can select&lt;br /&gt;
the active &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object in a &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt;&lt;br /&gt;
object via this id.&lt;br /&gt;
&lt;br /&gt;
Example: suppose that a chain has a point mutation at position 10,&lt;br /&gt;
consisting of a Ser and a Cys residue. Make sure that residue 10 of&lt;br /&gt;
this chain behaves as the Cys residue.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
residue = chain[10]&lt;br /&gt;
residue.disordered_select('CYS')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
In addition, you can get a list of all &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects (ie.&lt;br /&gt;
all &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; objects are 'unpacked' to their individual&lt;br /&gt;
&amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects) using the &amp;lt;code&amp;gt;get_unpacked_list&amp;lt;/code&amp;gt; method&lt;br /&gt;
of a (&amp;lt;code&amp;gt;Disordered&amp;lt;/code&amp;gt;)&amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object.&lt;br /&gt;
&lt;br /&gt;
==== Can I sort residues in a chain somehow? ====&lt;br /&gt;
&lt;br /&gt;
Yes, kinda, but I'm waiting for a request for this feature to finish&lt;br /&gt;
it :-).&lt;br /&gt;
&lt;br /&gt;
==== How are ligands and solvent handled? ====&lt;br /&gt;
&lt;br /&gt;
See 'What is a residue id?'.&lt;br /&gt;
&lt;br /&gt;
==== What about B factors? ====&lt;br /&gt;
&lt;br /&gt;
Well, yes! Bio.PDB supports isotropic and anisotropic B factors, and&lt;br /&gt;
also deals with standard deviations of anisotropic B factor if present&lt;br /&gt;
(see the section [[#Analysis]]).&lt;br /&gt;
&lt;br /&gt;
==== What about standard deviation of atomic positions? ====&lt;br /&gt;
&lt;br /&gt;
Yup, supported. See the section [[#Analysis]].&lt;br /&gt;
&lt;br /&gt;
==== I think the SMCRA data structure is not flexible/sexy/whatever enough... ====&lt;br /&gt;
&lt;br /&gt;
Sure, sure. Everybody is always coming up with (mostly vaporware or&lt;br /&gt;
partly implemented) data structures that handle all possible situations&lt;br /&gt;
and are extensible in all thinkable (and unthinkable) ways. The prosaic&lt;br /&gt;
truth however is that 99.9\% of people using (and I mean really using!)&lt;br /&gt;
crystal structures think in terms of models, chains, residues and&lt;br /&gt;
atoms. The philosophy of Bio.PDB is to provide a reasonably fast,&lt;br /&gt;
clean, simple, but complete data structure to access structure data.&lt;br /&gt;
The proof of the pudding is in the eating.&lt;br /&gt;
&lt;br /&gt;
Moreover, it is quite easy to build more specialised data structures&lt;br /&gt;
on top of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; class (eg. there's a &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt;&lt;br /&gt;
class). On the other hand, the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object is built&lt;br /&gt;
using a Parser/Consumer approach (called &amp;lt;code&amp;gt;PDBParser&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;MMCIFParser&amp;lt;/code&amp;gt;&lt;br /&gt;
and &amp;lt;code&amp;gt;StructureBuilder&amp;lt;/code&amp;gt;, respectively). One can easily reuse&lt;br /&gt;
the PDB/mmCIF parsers by implementing a specialised &amp;lt;code&amp;gt;StructureBuilder&amp;lt;/code&amp;gt;&lt;br /&gt;
class. It is of course also trivial to add support for new file formats&lt;br /&gt;
by writing new parsers.&lt;br /&gt;
&lt;br /&gt;
=== Analysis ===&lt;br /&gt;
&lt;br /&gt;
==== How do I extract information from an &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object? ====&lt;br /&gt;
&lt;br /&gt;
Using the following methods:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
a.get_name()           # atom name (spaces stripped, e.g. 'CA')&lt;br /&gt;
a.get_id()             # id (equals atom name)&lt;br /&gt;
a.get_coord()          # atomic coordinates&lt;br /&gt;
a.get_vector()         # atomic coordinates as Vector object&lt;br /&gt;
a.get_bfactor()        # isotropic B factor&lt;br /&gt;
a.get_occupancy()      # occupancy&lt;br /&gt;
a.get_altloc()         # alternative location specifier&lt;br /&gt;
a.get_sigatm()         # std. dev. of atomic parameters&lt;br /&gt;
a.get_siguij()         # std. dev. of anisotropic B factor&lt;br /&gt;
a.get_anisou()         # anisotropic B factor&lt;br /&gt;
a.get_fullname()       # atom name (with spaces, e.g. '.CA.')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I extract information from a &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object? ====&lt;br /&gt;
&lt;br /&gt;
Using the following methods:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
r.get_resname()         # return the residue name (eg. 'GLY')&lt;br /&gt;
r.is_disordered()       # 1 if the residue has disordered atoms&lt;br /&gt;
r.get_segid()           # return the SEGID&lt;br /&gt;
r.has_id(name)          # test if a residue has a certain atom&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I measure distances? ====&lt;br /&gt;
&lt;br /&gt;
That's simple: the minus operator for atoms has been overloaded to&lt;br /&gt;
return the distance between two atoms. &lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Get some atoms&lt;br /&gt;
ca1 = residue1['CA']&lt;br /&gt;
ca2 = residue2['CA']&lt;br /&gt;
# Simply subtract the atoms to get their distance&lt;br /&gt;
distance = ca1-ca2&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I measure angles? ====&lt;br /&gt;
&lt;br /&gt;
This can easily be done via the vector representation of the atomic coordinates, and the &amp;lt;code&amp;gt;calc_angle&amp;lt;/code&amp;gt; function from the &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; module:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
vector1 = atom1.get_vector()&lt;br /&gt;
vector2 = atom2.get_vector()&lt;br /&gt;
vector3 = atom3.get_vector()&lt;br /&gt;
angle = calc_angle(vector1, vector2, vector3)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I measure torsion angles? ====&lt;br /&gt;
&lt;br /&gt;
Again, this can easily be done via the vector representation of the&lt;br /&gt;
atomic coordinates, this time using the &amp;lt;code&amp;gt;calc_dihedral&amp;lt;/code&amp;gt; function&lt;br /&gt;
from the &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; module:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
vector1=atom1.get_vector()&lt;br /&gt;
vector2=atom2.get_vector()&lt;br /&gt;
vector3=atom3.get_vector()&lt;br /&gt;
vector4=atom4.get_vector()&lt;br /&gt;
angle=calc_dihedral(vector1, vector2, vector3, vector4)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I determine atom-atom contacts? ====&lt;br /&gt;
&lt;br /&gt;
Use &amp;lt;code&amp;gt;NeighborSearch&amp;lt;/code&amp;gt;. This uses a KD tree data structure coded&lt;br /&gt;
in C behind the screens, so it's pretty darn fast (see &amp;lt;code&amp;gt;Bio.KDTree&amp;lt;/code&amp;gt;).&lt;br /&gt;
&lt;br /&gt;
==== How do I extract polypeptides from a &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object? ====&lt;br /&gt;
&lt;br /&gt;
Use &amp;lt;code&amp;gt;PolypeptideBuilder&amp;lt;/code&amp;gt;. You can use the resulting &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt;&lt;br /&gt;
object to get the sequence as a &amp;lt;code&amp;gt;Seq&amp;lt;/code&amp;gt; object or to get a list&lt;br /&gt;
of C&amp;amp;alpha; atoms as well. Polypeptides can be built using a C-N&lt;br /&gt;
or a C&amp;amp;alpha;-C&amp;amp;alpha; distance criterion.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Using C-N &lt;br /&gt;
ppb=PPBuilder()&lt;br /&gt;
for pp in ppb.build_peptides(structure): &lt;br /&gt;
    print pp.get_sequence()&lt;br /&gt;
&lt;br /&gt;
# Using CA-CA&lt;br /&gt;
ppb=CaPPBuilder()&lt;br /&gt;
for pp in ppb.build_peptides(structure): &lt;br /&gt;
    print pp.get_sequence()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that in the above case only model 0 of the structure is considered&lt;br /&gt;
by &amp;lt;code&amp;gt;PolypeptideBuilder&amp;lt;/code&amp;gt;. However, it is possible to use &amp;lt;code&amp;gt;PolypeptideBuilder&amp;lt;/code&amp;gt;&lt;br /&gt;
to build &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt; objects from &amp;lt;code&amp;gt;Model&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;Chain&amp;lt;/code&amp;gt;&lt;br /&gt;
objects as well.&lt;br /&gt;
&lt;br /&gt;
==== How do I get the sequence of a structure? ====&lt;br /&gt;
&lt;br /&gt;
The first thing to do is to extract all polypeptides from the structure&lt;br /&gt;
(see previous entry). The sequence of each polypeptide can then easily&lt;br /&gt;
be obtained from the &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt; objects. The sequence is&lt;br /&gt;
represented as a Biopython &amp;lt;code&amp;gt;Seq&amp;lt;/code&amp;gt; object, and its alphabet is&lt;br /&gt;
defined by a &amp;lt;code&amp;gt;ProteinAlphabet&amp;lt;/code&amp;gt; object.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; seq = polypeptide.get_sequence()&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print seq&lt;br /&gt;
Seq('SNVVE...', &amp;lt;class Bio.Alphabet.ProteinAlphabet&amp;gt;)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I determine secondary structure? ====&lt;br /&gt;
&lt;br /&gt;
For this functionality, you need to install DSSP (and obtain a license&lt;br /&gt;
for it - free for academic use, see http://www.cmbi.kun.nl/gv/dssp/).&lt;br /&gt;
Then use the &amp;lt;code&amp;gt;DSSP&amp;lt;/code&amp;gt; class, which maps &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; objects&lt;br /&gt;
to their secondary structure (and accessible surface area). The DSSP&lt;br /&gt;
codes are listed in Table \ref{cap:DSSP-codes. Note that DSSP (the&lt;br /&gt;
program, and thus by consequence the class) cannot handle multiple&lt;br /&gt;
models!&lt;br /&gt;
&lt;br /&gt;
%&lt;br /&gt;
\begin{table&lt;br /&gt;
&lt;br /&gt;
==== \begin{tabular{|c|c|&lt;br /&gt;
\hline &lt;br /&gt;
Code&amp;amp;&lt;br /&gt;
Secondary structure\tabularnewline&lt;br /&gt;
\hline&lt;br /&gt;
\hline &lt;br /&gt;
H&amp;amp;&lt;br /&gt;
&amp;amp;alpha;-helix\tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
B&amp;amp;&lt;br /&gt;
Isolated &amp;amp;beta;-bridge residue\tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
E&amp;amp;&lt;br /&gt;
Strand \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
G&amp;amp;&lt;br /&gt;
3-10 helix \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
I&amp;amp;&lt;br /&gt;
$\Pi$-helix \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
T&amp;amp;&lt;br /&gt;
Turn\tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
S&amp;amp;&lt;br /&gt;
Bend \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
-&amp;amp;&lt;br /&gt;
Other\tabularnewline&lt;br /&gt;
\hline&lt;br /&gt;
\end{tabular&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
\caption{\label{cap:DSSP-codesDSSP codes in Bio.PDB.&lt;br /&gt;
\end{table&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==== How do I calculate the accessible surface area of a residue? ====&lt;br /&gt;
&lt;br /&gt;
Use the &amp;lt;code&amp;gt;DSSP&amp;lt;/code&amp;gt; class (see also previous entry). But see also&lt;br /&gt;
next entry.&lt;br /&gt;
&lt;br /&gt;
==== How do I calculate residue depth? ====&lt;br /&gt;
&lt;br /&gt;
Residue depth is the average distance of a residue's atoms from the&lt;br /&gt;
solvent accessible surface. It's a fairly new and very powerful parameterization&lt;br /&gt;
of solvent accessibility. For this functionality, you need to install&lt;br /&gt;
Michel Sanner's MSMS program (http://www.scripps.edu/pub/olson-web/people/sanner/html/msms_home.html).&lt;br /&gt;
Then use the &amp;lt;code&amp;gt;ResidueDepth&amp;lt;/code&amp;gt; class. This class behaves as a&lt;br /&gt;
dictionary which maps &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; objects to corresponding (residue&lt;br /&gt;
depth, C&amp;amp;alpha; depth) tuples. The C&amp;amp;alpha; depth is the distance&lt;br /&gt;
of a residue's C&amp;amp;alpha; atom to the solvent accessible surface. &lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
model = structure[0]&lt;br /&gt;
rd = ResidueDepth(model, pdb_file)&lt;br /&gt;
residue_depth, ca_depth = rd[some_residue]&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
You can also get access to the molecular surface itself (via the &amp;lt;code&amp;gt;get_surface&amp;lt;/code&amp;gt;&lt;br /&gt;
function), in the form of a Numeric python array with the surface points.&lt;br /&gt;
&lt;br /&gt;
==== How do I calculate Half Sphere Exposure? ====&lt;br /&gt;
&lt;br /&gt;
Half Sphere Exposure (HSE) is a new, 2D measure of solvent exposure.&lt;br /&gt;
Basically, it counts the number of C&amp;amp;alpha; atoms around a residue&lt;br /&gt;
in the direction of its side chain, and in the opposite direction&lt;br /&gt;
(within a radius of 13 Å). Despite its simplicity, it outperforms&lt;br /&gt;
many other measures of solvent exposure. An article describing this&lt;br /&gt;
novel 2D measure has been submitted.&lt;br /&gt;
&lt;br /&gt;
HSE comes in two flavors: HSE&amp;amp;alpha; and HSE&amp;amp;beta;. The former&lt;br /&gt;
only uses the C&amp;amp;alpha; atom positions, while the latter uses the&lt;br /&gt;
C&amp;amp;alpha; and C&amp;amp;beta; atom positions. The HSE measure is calculated&lt;br /&gt;
by the &amp;lt;code&amp;gt;HSExposure&amp;lt;/code&amp;gt; class, which can also calculate the contact&lt;br /&gt;
number. The latter class has methods which return dictionaries that&lt;br /&gt;
map a &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object to its corresponding HSE&amp;amp;alpha;, HSE&amp;amp;beta;&lt;br /&gt;
and contact number values.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
model = structure[0]&lt;br /&gt;
hse = HSExposure()&lt;br /&gt;
# Calculate HSEalpha&lt;br /&gt;
exp_ca = hse.calc_hs_exposure(model, option='CA3')&lt;br /&gt;
# Calculate HSEbeta&lt;br /&gt;
exp_cb = hse.calc_hs_exposure(model, option='CB')&lt;br /&gt;
# Calculate classical coordination number&lt;br /&gt;
exp_fs = hse.calc_fs_exposure(model)&lt;br /&gt;
# Print HSEalpha for a residue&lt;br /&gt;
print exp_ca[some_residue]&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I map the residues of two related structures onto each other? ====&lt;br /&gt;
&lt;br /&gt;
First, create an alignment file in FASTA format, then use the &amp;lt;code&amp;gt;StructureAlignment&amp;lt;/code&amp;gt;&lt;br /&gt;
class. This class can also be used for alignments with more than two&lt;br /&gt;
structures.&lt;br /&gt;
&lt;br /&gt;
==== How do I test if a Residue object is an amino acid? ====&lt;br /&gt;
&lt;br /&gt;
Use &amp;lt;code&amp;gt;is_aa(residue)&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
==== Can I do vector operations on atomic coordinates? ====&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects return a &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; object representation&lt;br /&gt;
of the coordinates with the &amp;lt;code&amp;gt;get_vector&amp;lt;/code&amp;gt; method. &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt;&lt;br /&gt;
implements the full set of 3D vector operations, matrix multiplication&lt;br /&gt;
(left and right) and some advanced rotation-related operations as&lt;br /&gt;
well. See also next question.&lt;br /&gt;
&lt;br /&gt;
==== How do I put a virtual C&amp;amp;beta; on a Gly residue? ====&lt;br /&gt;
&lt;br /&gt;
OK, I admit, this example is only present to show off the possibilities&lt;br /&gt;
of Bio.PDB's &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; module (though this code is actually&lt;br /&gt;
used in the &amp;lt;code&amp;gt;HSExposure&amp;lt;/code&amp;gt; module, which contains a novel way&lt;br /&gt;
to parametrize residue exposure - publication underway). Suppose that&lt;br /&gt;
you would like to find the position of a Gly residue's C&amp;amp;beta; atom,&lt;br /&gt;
if it had one. How would you do that? Well, rotating the N atom of&lt;br /&gt;
the Gly residue along the C&amp;amp;alpha;-C bond over -120 degrees roughly&lt;br /&gt;
puts it in the position of a virtual C&amp;amp;beta; atom. Here's how to&lt;br /&gt;
do it, making use of the &amp;lt;code&amp;gt;rotaxis&amp;lt;/code&amp;gt; method (which can be used&lt;br /&gt;
to construct a rotation around a certain axis) of the &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt;&lt;br /&gt;
module:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# get atom coordinates as vectors&lt;br /&gt;
n = residue['N'].get_vector() &lt;br /&gt;
c = residue['C'].get_vector() &lt;br /&gt;
ca = residue['CA'].get_vector()&lt;br /&gt;
# center at origin&lt;br /&gt;
n = n - ca&lt;br /&gt;
c = c - ca&lt;br /&gt;
# find rotation matrix that rotates n -120 degrees along the ca-c vector&lt;br /&gt;
rot = rotaxis(-pi{*120.0/180.0, c)&lt;br /&gt;
# apply rotation to ca-n vector&lt;br /&gt;
cb_at_origin = n.left_multiply(rot)&lt;br /&gt;
# put on top of ca atom&lt;br /&gt;
cb = cb_at_origin + ca&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
This example shows that it's possible to do some quite nontrivial&lt;br /&gt;
vector operations on atomic data, which can be quite useful. In addition&lt;br /&gt;
to all the usual vector operations (cross (use &amp;lt;code&amp;gt;**&amp;lt;/code&amp;gt;), and&lt;br /&gt;
dot (use &amp;lt;code&amp;gt;*&amp;lt;/code&amp;gt;) product, angle, norm, etc.) and the above mentioned&lt;br /&gt;
&amp;lt;code&amp;gt;rotaxis function, the &amp;lt;code&amp;gt;Vector module also has methods&lt;br /&gt;
to rotate (&amp;lt;code&amp;gt;rotmat&amp;lt;/code&amp;gt;) or reflect (&amp;lt;code&amp;gt;refmat&amp;lt;/code&amp;gt;) one vector&lt;br /&gt;
on top of another.&lt;br /&gt;
&lt;br /&gt;
=== Manipulating the structure ===&lt;br /&gt;
&lt;br /&gt;
==== How do I superimpose two structures? ====&lt;br /&gt;
&lt;br /&gt;
Surprisingly, this is done using the &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; object.&lt;br /&gt;
This object calculates the rotation and translation matrix that rotates&lt;br /&gt;
two lists of atoms on top of each other in such a way that their RMSD&lt;br /&gt;
is minimized. Of course, the two lists need to contain the same amount&lt;br /&gt;
of atoms. The &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; object can also apply the rotation/translation&lt;br /&gt;
to a list of atoms. The rotation and translation are stored as a tuple&lt;br /&gt;
in the &amp;lt;code&amp;gt;rotran&amp;lt;/code&amp;gt; attribute of the &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; object&lt;br /&gt;
(note that the rotation is right multiplying!). The RMSD is stored&lt;br /&gt;
in the &amp;lt;code&amp;gt;rmsd&amp;lt;/code&amp;gt; attribute.&lt;br /&gt;
&lt;br /&gt;
The algorithm used by &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; comes from \textit{Matrix&lt;br /&gt;
computations, 2nd ed. Golub, G. \&amp;amp; Van Loan (1989) and makes use&lt;br /&gt;
of singular value decomposition (this is implemented in the general&lt;br /&gt;
&amp;lt;code&amp;gt;Bio.SVDSuperimposer&amp;lt;/code&amp;gt; module).&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
sup = Superimposer()&lt;br /&gt;
# Specify the atom lists&lt;br /&gt;
# 'fixed' and 'moving' are lists of Atom objects&lt;br /&gt;
# The moving atoms will be put on the fixed atoms&lt;br /&gt;
sup.set_atoms(fixed, moving)&lt;br /&gt;
# Print rotation/translation/rmsd&lt;br /&gt;
print sup.rotran&lt;br /&gt;
print sup.rms &lt;br /&gt;
# Apply rotation/translation to the moving atoms&lt;br /&gt;
sup.apply(moving)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I superimpose two structures based on their active sites? ====&lt;br /&gt;
&lt;br /&gt;
Pretty easily. Use the active site atoms to calculate the rotation/translation&lt;br /&gt;
matrices (see above), and apply these to the whole molecule.&lt;br /&gt;
&lt;br /&gt;
==== Can I manipulate the atomic coordinates? ====&lt;br /&gt;
&lt;br /&gt;
Yes, using the &amp;lt;code&amp;gt;transform&amp;lt;/code&amp;gt; method of the &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object,&lt;br /&gt;
or directly using the &amp;lt;code&amp;gt;set_coord&amp;lt;/code&amp;gt; method.&lt;br /&gt;
&lt;br /&gt;
== Other Structural Bioinformatics modules ==&lt;br /&gt;
&lt;br /&gt;
==== Bio.SCOP ====&lt;br /&gt;
&lt;br /&gt;
See the main Biopython tutorial.&lt;br /&gt;
&lt;br /&gt;
==== Bio.FSSP ====&lt;br /&gt;
&lt;br /&gt;
No documentation available yet.&lt;br /&gt;
&lt;br /&gt;
== You haven't answered my question yet! ==&lt;br /&gt;
&lt;br /&gt;
Woah! It's late and I'm tired, and a glass of excellent ''Pedro&lt;br /&gt;
Ximenez'' sherry is waiting for me. Just drop me a mail, and I'll answer&lt;br /&gt;
you in the morning (with a bit of luck...).&lt;br /&gt;
&lt;br /&gt;
== Contributors ==&lt;br /&gt;
&lt;br /&gt;
The main author/maintainer of Bio.PDB is:&lt;br /&gt;
&lt;br /&gt;
 Thomas Hamelryck&lt;br /&gt;
 Bioinformatics center&lt;br /&gt;
 Institute of Molecular Biology&lt;br /&gt;
 University of Copenhagen&lt;br /&gt;
 Universitetsparken 15, Bygning 10&lt;br /&gt;
 DK-2100 København Ø&lt;br /&gt;
 Denmark&lt;br /&gt;
 thamelry@binf.ku.dk&lt;br /&gt;
&lt;br /&gt;
Kristian Rother donated code to interact with the PDB database, and to parse the PDB&lt;br /&gt;
header. Indraneel Majumdar sent in some bug reports and assisted in&lt;br /&gt;
coding the &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt; module. Many thanks to Brad Chapman,&lt;br /&gt;
Jeffrey Chang, Andrew Dalke and Iddo Friedberg for suggestions, comments,&lt;br /&gt;
help and/or biting criticism :-).&lt;br /&gt;
&lt;br /&gt;
== Can I contribute? ==&lt;br /&gt;
&lt;br /&gt;
Yes, yes, yes! Just send me an e-mail (thamelry@binf.ku.dk) or to the Biopython developers (biopython-dev@biopython.org) if you&lt;br /&gt;
have something useful to contribute! Eternal fame awaits!&lt;/div&gt;</summary>
		<author><name>Mdehoon</name></author>	</entry>

	<entry>
		<id>http://www.biopython.org/wiki/The_Biopython_Structural_Bioinformatics_FAQ</id>
		<title>The Biopython Structural Bioinformatics FAQ</title>
		<link rel="alternate" type="text/html" href="http://www.biopython.org/wiki/The_Biopython_Structural_Bioinformatics_FAQ"/>
				<updated>2013-03-26T03:33:52Z</updated>
		
		<summary type="html">&lt;p&gt;Mdehoon: /* How do I calculate Half Sphere Exposure? */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Introduction ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB is a biopython module that focuses on working with crystal&lt;br /&gt;
structures of biological macromolecules. This document gives a fairly&lt;br /&gt;
complete overview of Bio.PDB.&lt;br /&gt;
&lt;br /&gt;
== Bio.PDB's installation ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB is automatically installed as part of Biopython. Biopython&lt;br /&gt;
can be obtained from http://www.biopython.org. It runs on many&lt;br /&gt;
platforms (Linux/Unix, windows, Mac,...).&lt;br /&gt;
&lt;br /&gt;
== Who's using Bio.PDB? ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB was used in the construction of DISEMBL, a web server that&lt;br /&gt;
predicts disordered regions in proteins (http://dis.embl.de/),&lt;br /&gt;
and COLUMBA, a website that provides annotated protein structures&lt;br /&gt;
(http://www.columba-db.de/). Bio.PDB has also been used to&lt;br /&gt;
perform a large scale search for active sites similarities between&lt;br /&gt;
protein structures in the PDB (see [http://dx.doi.org/10.1002/prot.10338 \textit{Proteins Struct. Func.&lt;br /&gt;
Gen., \textbf{2003, 51, 96-108]) , and to develop a new algorithm&lt;br /&gt;
that identifies linear secondary structure elements ([\emph{BMC Bioinformatics,&lt;br /&gt;
\textbf{2005, 6, 202 http://www.biomedcentral.com/1471-2105/6/202]).&lt;br /&gt;
&lt;br /&gt;
Judging from requests for features and information, Bio.PDB is also&lt;br /&gt;
used by several LPCs (Large Pharmaceutical Companies :-).&lt;br /&gt;
&lt;br /&gt;
== Is there a Bio.PDB reference? ==&lt;br /&gt;
&lt;br /&gt;
Yes, and I'd appreciate it if you would refer to Bio.PDB in publications&lt;br /&gt;
if you make use of it. The reference is:&lt;br /&gt;
&lt;br /&gt;
\begin{quote&lt;br /&gt;
Hamelryck, T., Manderick, B. (2003) PDB parser and structure class&lt;br /&gt;
implemented in Python. \textit{Bioinformatics, \textbf{19, 2308-2310. &lt;br /&gt;
\end{quote&lt;br /&gt;
The article can be freely downloaded via the Bioinformatics journal&lt;br /&gt;
website (http://www.binf.ku.dk/users/thamelry/references.html).&lt;br /&gt;
I welcome e-mails telling me what you are using Bio.PDB for. Feature&lt;br /&gt;
requests are welcome too.&lt;br /&gt;
&lt;br /&gt;
== How well tested is Bio.PDB? ==&lt;br /&gt;
&lt;br /&gt;
Pretty well, actually. Bio.PDB has been extensively tested on nearly&lt;br /&gt;
5500 structures from the PDB - all structures seemed to be parsed&lt;br /&gt;
correctly. More details can be found in the Bio.PDB Bioinformatics&lt;br /&gt;
article. Bio.PDB has been used/is being used in many research projects&lt;br /&gt;
as a reliable tool. In fact, I'm using Bio.PDB almost daily for research&lt;br /&gt;
purposes and continue working on improving it and adding new features.&lt;br /&gt;
&lt;br /&gt;
== How fast is it? ==&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;PDBParser&amp;lt;/code&amp;gt; performance was tested on about 800 structures&lt;br /&gt;
(each belonging to a unique SCOP superfamily). This takes about 20&lt;br /&gt;
minutes, or on average 1.5 seconds per structure. Parsing the structure&lt;br /&gt;
of the large ribosomal subunit (1FKK), which contains about 64000&lt;br /&gt;
atoms, takes 10 seconds on a 1000 MHz PC. In short: it's more than&lt;br /&gt;
fast enough for many applications.&lt;br /&gt;
&lt;br /&gt;
== Why should I use Bio.PDB? ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB might be exactly what you want, and then again it might not.&lt;br /&gt;
If you are interested in data mining the PDB header, you might want&lt;br /&gt;
to look elsewhere because there is only limited support for this.&lt;br /&gt;
If you look for a powerful, complete data structure to access the&lt;br /&gt;
atomic data Bio.PDB is probably for you. &lt;br /&gt;
&lt;br /&gt;
== Usage ==&lt;br /&gt;
&lt;br /&gt;
=== General questions ===&lt;br /&gt;
&lt;br /&gt;
==== Importing Bio.PDB ====&lt;br /&gt;
&lt;br /&gt;
That's simple:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
from Bio.PDB import *&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Is there support for molecular graphics? ====&lt;br /&gt;
&lt;br /&gt;
Not directly, mostly since there are quite a few Python based/Python&lt;br /&gt;
aware solutions already, that can potentially be used with Bio.PDB.&lt;br /&gt;
My choice is Pymol, BTW (I've used this successfully with Bio.PDB,&lt;br /&gt;
and there will probably be specific PyMol modules in Bio.PDB soon/some&lt;br /&gt;
day). Python based/aware molecular graphics solutions include:&lt;br /&gt;
&lt;br /&gt;
* PyMol: http://pymol.sourceforge.net/&lt;br /&gt;
* Chimera: http://www.cgl.ucsf.edu/chimera/&lt;br /&gt;
* PMV: http://www.scripps.edu/ sanner/python/&lt;br /&gt;
* Coot: http://www.ysbl.york.ac.uk/ emsley/coot/&lt;br /&gt;
* CCP4mg: http://www.ysbl.york.ac.uk/ lizp/molgraphics.html&lt;br /&gt;
* mmLib: http://pymmlib.sourceforge.net/ &lt;br /&gt;
* VMD: http://www.ks.uiuc.edu/Research/vmd/&lt;br /&gt;
* MMTK: http://starship.python.net/crew/hinsen/MMTK/&lt;br /&gt;
&lt;br /&gt;
I'd be crazy to write another molecular graphics application (been&lt;br /&gt;
there - done that, actually :-).&lt;br /&gt;
&lt;br /&gt;
=== Input/output ===&lt;br /&gt;
&lt;br /&gt;
==== How do I create a structure object from a PDB file? ====&lt;br /&gt;
&lt;br /&gt;
First, create a &amp;lt;code&amp;gt;PDBParser&amp;lt;/code&amp;gt; object:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
parser = PDBParser()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then, create a structure object from a PDB file in the following way&lt;br /&gt;
(the PDB file in this case is called '1FAT.pdb', 'PHA-L' is a user&lt;br /&gt;
defined name for the structure):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
structure = parser.get_structure('PHA-L', '1FAT.pdb')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I create a structure object from an mmCIF file? ====&lt;br /&gt;
&lt;br /&gt;
Similarly to the case the case of PDB files, first create an &amp;lt;code&amp;gt;MMCIFParser&amp;lt;/code&amp;gt;&lt;br /&gt;
object:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
parser = MMCIFParser()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Then use this parser to create a structure object from the mmCIF file:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
structure = parser.get_structure('PHA-L', '1FAT.cif')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== ...and what about the new PDB XML format? ====&lt;br /&gt;
&lt;br /&gt;
That's not yet supported, but I'm definitely planning to support that&lt;br /&gt;
in the future (it's not a lot of work). Contact me if you need this,&lt;br /&gt;
it might encourage me :-).&lt;br /&gt;
&lt;br /&gt;
==== I'd like to have some more low level access to an mmCIF file... ====&lt;br /&gt;
&lt;br /&gt;
You got it. You can create a python dictionary that maps all mmCIF&lt;br /&gt;
tags in an mmCIF file to their values. If there are multiple values&lt;br /&gt;
(like in the case of tag &amp;lt;code&amp;gt;_atom_site.Cartn_y&amp;lt;/code&amp;gt;, which holds&lt;br /&gt;
the ''y'' coordinates of all atoms), the tag is mapped to a list of values.&lt;br /&gt;
The dictionary is created from the mmCIF file as follows:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
mmcif_dict = MMCIF2Dict('1FAT.cif')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example: get the solvent content from an mmCIF file:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
sc = mmcif_dict['_exptl_crystal.density_percent_sol']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example: get the list of the ''y'' coordinates of all atoms&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
y_list = mmcif_dict['_atom_site.Cartn_y']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Can I access the header information? ====&lt;br /&gt;
&lt;br /&gt;
Thanks to Christian Rother you can access some information from the&lt;br /&gt;
PDB header. Note however that many PDB files contain headers with&lt;br /&gt;
incomplete or erroneous information. Many of the errors have been&lt;br /&gt;
fixed in the equivalent mmCIF files.&lt;br /&gt;
'''Hence, if you are interested in the header information, it is a good idea to extract information from mmCIF files using the &amp;lt;code&amp;gt;MMCIF2Dict&amp;lt;/code&amp;gt; tool described above, instead of parsing the PDB header.''' &lt;br /&gt;
&lt;br /&gt;
Now that is clarified, let's return to parsing the PDB header. The&lt;br /&gt;
structure object has an attribute called &amp;lt;code&amp;gt;header&amp;lt;/code&amp;gt; which is&lt;br /&gt;
a python dictionary that maps header records to their values.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
resolution = structure.header['resolution']&lt;br /&gt;
keywords = structure.header['keywords']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The available keys are &amp;lt;code&amp;gt;name&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;head&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;deposition_date&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;release_date&amp;lt;/code&amp;gt;,&lt;br /&gt;
&amp;lt;code&amp;gt;structure_method&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;resolution&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;structure_reference&amp;lt;/code&amp;gt; (maps to&lt;br /&gt;
a list of references), &amp;lt;code&amp;gt;journal_reference&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;author&amp;lt;/code&amp;gt; and&lt;br /&gt;
&amp;lt;code&amp;gt;compound&amp;lt;/code&amp;gt; (maps to a dictionary with various information about&lt;br /&gt;
the crystallized compound).&lt;br /&gt;
&lt;br /&gt;
The dictionary can also be created without creating a &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt;&lt;br /&gt;
object, ie. directly from the PDB file:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
handle = open(filename,'r')&lt;br /&gt;
header_dict = parse_pdb_header(handle)&lt;br /&gt;
handle.close()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Can I use Bio.PDB with NMR structures (ie. with more than one model)? ====&lt;br /&gt;
&lt;br /&gt;
Sure. Many PDB parsers assume that there is only one model, making&lt;br /&gt;
them all but useless for NMR structures. The design of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt;&lt;br /&gt;
object makes it easy to handle PDB files with more than one model&lt;br /&gt;
(see section [[#The Structure object]]). &lt;br /&gt;
&lt;br /&gt;
==== How do I download structures from the PDB? ====&lt;br /&gt;
&lt;br /&gt;
This can be done using the &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object, using the &amp;lt;code&amp;gt;retrieve_pdb_file&amp;lt;/code&amp;gt;&lt;br /&gt;
method. The argument for this method is the PDB identifier of the&lt;br /&gt;
structure.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
pdbl = PDBList()&lt;br /&gt;
pdbl.retrieve_pdb_file('1FAT')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; class can also be used as a command-line tool: &lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
python PDBList.py 1fat&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
The downloaded file will be called &amp;lt;code&amp;gt;pdb1fat.ent&amp;lt;/code&amp;gt; and stored&lt;br /&gt;
in the current working directory. Note that the &amp;lt;code&amp;gt;retrieve_pdb_file&amp;lt;/code&amp;gt;&lt;br /&gt;
method also has an optional argument &amp;lt;code&amp;gt;pdir&amp;lt;/code&amp;gt; that specifies&lt;br /&gt;
a specific directory in which to store the downloaded PDB files. &lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;retrieve_pdb_file&amp;lt;/code&amp;gt; method also has some options for specifying&lt;br /&gt;
the compression format used for the download, and the program used&lt;br /&gt;
for local decompression (default &amp;lt;code&amp;gt;.Z&amp;lt;/code&amp;gt; format and &amp;lt;code&amp;gt;gunzip&amp;lt;/code&amp;gt;).&lt;br /&gt;
In addition, the PDB ftp site can be specified upon creation of the&lt;br /&gt;
&amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object. By default, the server of the Worldwide Protein Data Bank (\url{ftp://ftp.wwpdb.org/pub/pdb/data/structures/divided/pdb/})&lt;br /&gt;
is used. See the API documentation for more details. Thanks again&lt;br /&gt;
to Kristian Rother for donating this module.&lt;br /&gt;
&lt;br /&gt;
==== How do I download the entire PDB? ====&lt;br /&gt;
&lt;br /&gt;
The following commands will store all PDB files in the &amp;lt;code&amp;gt;/data/pdb&amp;lt;/code&amp;gt;&lt;br /&gt;
directory: &lt;br /&gt;
&amp;lt;code&amp;gt;&lt;br /&gt;
python PDBList.py all /data/pdb&lt;br /&gt;
python PDBList.py all /data/pdb -d&lt;br /&gt;
&amp;lt;/code&amp;gt;&lt;br /&gt;
The API method for this is called &amp;lt;code&amp;gt;download_entire_pdb&amp;lt;/code&amp;gt;.&lt;br /&gt;
Adding the &amp;lt;code&amp;gt;-d&amp;lt;/code&amp;gt; option will store all files in the same directory.&lt;br /&gt;
Otherwise, they are sorted into PDB-style subdirectories according&lt;br /&gt;
to their PDB ID's. Depending on the traffic, a complete download will&lt;br /&gt;
take 2-4 days.&lt;br /&gt;
&lt;br /&gt;
==== How do I keep a local copy of the PDB up-to-date? ====&lt;br /&gt;
&lt;br /&gt;
This can also be done using the &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object. One simply&lt;br /&gt;
creates a &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; object (specifying the directory where&lt;br /&gt;
the local copy of the PDB is present) and calls the &amp;lt;code&amp;gt;update_pdb&amp;lt;/code&amp;gt;&lt;br /&gt;
method:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
pl = PDBList(pdb='/data/pdb')&lt;br /&gt;
pl.update_pdb()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
One can of course make a weekly cronjob out of this to keep&lt;br /&gt;
the local copy automatically up-to-date. The PDB ftp site can also&lt;br /&gt;
be specified (see API documentation).&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt; has some additional methods that can be of use. The&lt;br /&gt;
&amp;lt;code&amp;gt;get_all_obsolete&amp;lt;/code&amp;gt; method can be used to get a list of all&lt;br /&gt;
obsolete PDB entries. The &amp;lt;code&amp;gt;changed_this_week&amp;lt;/code&amp;gt; method can&lt;br /&gt;
be used to obtain the entries that were added, modified or obsoleted&lt;br /&gt;
during the current week. For more info on the possibilities of &amp;lt;code&amp;gt;PDBList&amp;lt;/code&amp;gt;,&lt;br /&gt;
see the API documentation.&lt;br /&gt;
&lt;br /&gt;
==== What about all those buggy PDB files? ====&lt;br /&gt;
&lt;br /&gt;
It is well known that many PDB files contain semantic errors (I'm&lt;br /&gt;
not talking about the structures themselves know, but their representation&lt;br /&gt;
in PDB files). Bio.PDB tries to handle this in two ways. The PDBParser&lt;br /&gt;
object can behave in two ways: a restrictive way and a permissive&lt;br /&gt;
way (THIS IS NOW THE DEFAULT). The restrictive way used to be the&lt;br /&gt;
default, but people seemed to think that Bio.PDB 'crashed' due to&lt;br /&gt;
a bug (hah!), so I changed it. If you ever encounter a real bug, please&lt;br /&gt;
tell me immediately!&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Permissive parser&lt;br /&gt;
parser = PDBParser(PERMISSIVE=1)&lt;br /&gt;
parser = PDBParser()  # The same (default)&lt;br /&gt;
# Strict parser&lt;br /&gt;
strict_parser = PDBParser(PERMISSIVE=0)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
In the permissive state (DEFAULT), PDB files that obviously contain&lt;br /&gt;
errors are 'corrected' (i.e. some residues or atoms are left out).&lt;br /&gt;
These errors include:&lt;br /&gt;
* Multiple residues with the same identifier&lt;br /&gt;
* Multiple atoms with the same identifier (taking into account the altloc identifier)&lt;br /&gt;
These errors indicate real problems in the PDB file (for details see&lt;br /&gt;
the Bioinformatics article). In the restrictive state, PDB files with&lt;br /&gt;
errors cause an exception to occur. This is useful to find errors&lt;br /&gt;
in PDB files.&lt;br /&gt;
&lt;br /&gt;
Some errors however are automatically corrected. Normally each disordered&lt;br /&gt;
atom should have a non-blank altloc identifier. However, there are&lt;br /&gt;
many structures that do not follow this convention, and have a blank&lt;br /&gt;
and a non-blank identifier for two disordered positions of the same&lt;br /&gt;
atom. This is automatically interpreted in the right way.&lt;br /&gt;
&lt;br /&gt;
Sometimes a structure contains a list of residues belonging to chain&lt;br /&gt;
A, followed by residues belonging to chain B, and again followed by&lt;br /&gt;
residues belonging to chain A, i.e. the chains are 'broken'. This&lt;br /&gt;
is also correctly interpreted.&lt;br /&gt;
&lt;br /&gt;
==== Can I write PDB files? ====&lt;br /&gt;
&lt;br /&gt;
Use the PDBIO class for this. It's easy to write out specific parts&lt;br /&gt;
of a structure too, of course.&lt;br /&gt;
&lt;br /&gt;
Example: saving a structure&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
io = PDBIO()&lt;br /&gt;
io.set_structure(s)&lt;br /&gt;
io.save('out.pdb')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
If you want to write out a part of the structure, make use of the&lt;br /&gt;
&amp;lt;code&amp;gt;Select&amp;lt;/code&amp;gt; class (also in &amp;lt;code&amp;gt;PDBIO&amp;lt;/code&amp;gt;). Select has four methods:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
accept_model(model)&lt;br /&gt;
accept_chain(chain)&lt;br /&gt;
accept_residue(residue)&lt;br /&gt;
accept_atom(atom)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
By default, every method returns 1 (which means the model/chain/residue/atom&lt;br /&gt;
is included in the output). By subclassing &amp;lt;code&amp;gt;Select&amp;lt;/code&amp;gt; and returning&lt;br /&gt;
0 when appropriate you can exclude models, chains, etc. from the output.&lt;br /&gt;
Cumbersome maybe, but very powerful. The following code only writes&lt;br /&gt;
out glycine residues:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
class GlySelect(Select):&lt;br /&gt;
    def accept_residue(self, residue):&lt;br /&gt;
        if residue.get_name()=='GLY':&lt;br /&gt;
            return 1&lt;br /&gt;
        else:&lt;br /&gt;
            return 0&lt;br /&gt;
&lt;br /&gt;
io = PDBIO()&lt;br /&gt;
io.set_structure(s)&lt;br /&gt;
io.save('gly_only.pdb', GlySelect())&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
If this is all too complicated for you, the &amp;lt;code&amp;gt;Dice&amp;lt;/code&amp;gt; module contains&lt;br /&gt;
a handy &amp;lt;code&amp;gt;extract&amp;lt;/code&amp;gt; function that writes out all residues in&lt;br /&gt;
a chain between a start and end residue.&lt;br /&gt;
&lt;br /&gt;
==== Can I write mmCIF files? ====&lt;br /&gt;
&lt;br /&gt;
No, and I also don't have plans to add that functionality soon (or&lt;br /&gt;
ever - I don't need it at all, and it's a lot of work, plus no-one&lt;br /&gt;
has ever asked for it). People who want to add this can contact me.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== The Structure object ===&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==== What's the overall layout of a Structure object? ====&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object follows the so-called '''SMCRA'''&lt;br /&gt;
(Structure/Model/Chain/Residue/Atom) architecture : &lt;br /&gt;
&lt;br /&gt;
\begin{itemize&lt;br /&gt;
\item A structure consists of models&lt;br /&gt;
\item A model consists of chains&lt;br /&gt;
\item A chain consists of residues&lt;br /&gt;
\item A residue consists of atoms&lt;br /&gt;
\end{itemize&lt;br /&gt;
This is the way many structural biologists/bioinformaticians think&lt;br /&gt;
about structure, and provides a simple but efficient way to deal with&lt;br /&gt;
structure. Additional stuff is essentially added when needed. A UML&lt;br /&gt;
diagram of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object (forget about the &amp;lt;code&amp;gt;Disordered&amp;lt;/code&amp;gt;&lt;br /&gt;
classes for now) is shown in Fig. \ref{cap:SMCRA.&lt;br /&gt;
&lt;br /&gt;
%&lt;br /&gt;
\begin{figure[tbh]&lt;br /&gt;
\begin{center\includegraphics[%&lt;br /&gt;
  width=100mm,&lt;br /&gt;
  keepaspectratio]{images/smcra.png\end{center&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
\caption{\label{cap:SMCRAUML diagram of SMCRA architecture of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt;&lt;br /&gt;
object. Full lines with diamonds denote aggregation, full lines with&lt;br /&gt;
arrows denote referencing, full lines with triangles denote inheritance&lt;br /&gt;
and dashed lines with triangles denote interface realization. &lt;br /&gt;
\end{figure&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==== How do I navigate through a Structure object? ====&lt;br /&gt;
&lt;br /&gt;
The following code iterates through all atoms of a structure:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
p = PDBParser()&lt;br /&gt;
structure = p.get_structure('X', 'pdb1fat.ent')&lt;br /&gt;
for model in structure:&lt;br /&gt;
    for chain in model:&lt;br /&gt;
        for residue in chain:&lt;br /&gt;
            for atom in residue:&lt;br /&gt;
                print atom&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
There are also some shortcuts:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Iterate over all atoms in a structure&lt;br /&gt;
for atom in structure.get_atoms():&lt;br /&gt;
    print atom&lt;br /&gt;
&lt;br /&gt;
# Iterate over all residues in a model&lt;br /&gt;
for residue in model.get_residues():&lt;br /&gt;
    print residue&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Structures, models, chains, residues and atoms are called '''Entities'''&lt;br /&gt;
in Biopython. You can always get a parent &amp;lt;code&amp;gt;Entity&amp;lt;/code&amp;gt; from a child&lt;br /&gt;
&amp;lt;code&amp;gt;Entity&amp;lt;/code&amp;gt;, e.g.:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
residue = atom.get_parent()&lt;br /&gt;
chain = residue.get_parent()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
You can also test whether an &amp;lt;code&amp;gt;Entity&amp;lt;/code&amp;gt; has a certain child using&lt;br /&gt;
the &amp;lt;code&amp;gt;has_id&amp;lt;/code&amp;gt; method.&lt;br /&gt;
&lt;br /&gt;
==== Can I do that a bit more conveniently? ====&lt;br /&gt;
&lt;br /&gt;
You can do things like:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
atoms = structure.get_atoms()&lt;br /&gt;
residues = structure.get_residues()&lt;br /&gt;
atoms = chain.get_atoms()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
You can also use the &amp;lt;code&amp;gt;Selection.unfold_entities&amp;lt;/code&amp;gt; function:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Get all residues from a structure&lt;br /&gt;
res_list = Selection.unfold_entities(structure, 'R')&lt;br /&gt;
# Get all atoms from a chain&lt;br /&gt;
atom_list = Selection.unfold_entities(chain, 'A')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Obviously, &amp;lt;code&amp;gt;A&amp;lt;/code&amp;gt;=atom, &amp;lt;code&amp;gt;R&amp;lt;/code&amp;gt;=residue, &amp;lt;code&amp;gt;C&amp;lt;/code&amp;gt;=chain, &amp;lt;code&amp;gt;M&amp;lt;/code&amp;gt;=model, &amp;lt;code&amp;gt;S&amp;lt;/code&amp;gt;=structure.&lt;br /&gt;
You can use this to go up in the hierarchy, e.g. to get a list of (unique) &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; or &amp;lt;code&amp;gt;Chain&amp;lt;/code&amp;gt; parents from a list of&lt;br /&gt;
&amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;s:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
residue_list = Selection.unfold_entities(atom_list, 'R')&lt;br /&gt;
chain_list = Selection.unfold_entities(atom_list, 'C')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
For more info, see the API documentation.&lt;br /&gt;
&lt;br /&gt;
==== How do I extract a specific &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;Chain&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;Model&amp;lt;/code&amp;gt; from a &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt;? ====&lt;br /&gt;
&lt;br /&gt;
Easy. Here are some examples:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
model = structure[0]&lt;br /&gt;
chain = model['A']&lt;br /&gt;
residue = chain[100]&lt;br /&gt;
atom = residue['CA']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Note that you can use a shortcut:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
atom = structure[0]['A'][100]['CA']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== What is a model id? ====&lt;br /&gt;
&lt;br /&gt;
The model id is an integer which denotes the rank of the model in&lt;br /&gt;
the PDB/mmCIF file. The model is starts at 0. Crystal structures generally&lt;br /&gt;
have only one model (with id 0), while NMR files usually have several&lt;br /&gt;
models.&lt;br /&gt;
&lt;br /&gt;
==== What is a chain id? ====&lt;br /&gt;
&lt;br /&gt;
The chain id is specified in the PDB/mmCIF file, and is a single character&lt;br /&gt;
(typically a letter). &lt;br /&gt;
&lt;br /&gt;
==== What is a residue id? ====&lt;br /&gt;
&lt;br /&gt;
This is a bit more complicated, due to the clumsy PDB format. A residue&lt;br /&gt;
id is a tuple with three elements:&lt;br /&gt;
* The '''hetero-flag''': this is &amp;lt;code&amp;gt;'H_'&amp;lt;/code&amp;gt; plus the name of the hetero-residue (e.g. &amp;lt;code&amp;gt;'H_GLC'&amp;lt;/code&amp;gt; in the case of a glucose molecule), or &amp;lt;code&amp;gt;'W'&amp;lt;/code&amp;gt; in the case of a water molecule.&lt;br /&gt;
* The '''sequence identifier''' in the chain, e.g. 100&lt;br /&gt;
* The '''insertion code''', e.g. &amp;lt;code&amp;gt;'A'&amp;lt;/code&amp;gt;. The insertion code is sometimes used to preserve a certain desirable residue numbering scheme. A Ser 80 insertion mutant (inserted e.g. between a Thr 80 and an Asn 81 residue) could e.g. have sequence identifiers and insertion codes as follows: Thr 80 A, Ser 80 B, Asn 81. In this way the residue numbering scheme stays in tune with that of the wild type structure.&lt;br /&gt;
&lt;br /&gt;
The id of the above glucose residue would thus be &amp;lt;code&amp;gt;('H_GLC', 100, 'A')&amp;lt;/code&amp;gt;. If the hetero-flag and insertion code are blank, the sequence&lt;br /&gt;
identifier alone can be used:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Full id&lt;br /&gt;
residue = chain[(' ', 100, ' ')]&lt;br /&gt;
# Shortcut id&lt;br /&gt;
residue = chain[100]&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
The reason for the hetero-flag is that many, many PDB files use the same sequence identifier for an amino acid and a hetero-residue or a water, which would create obvious problems if the hetero-flag was not used.&lt;br /&gt;
&lt;br /&gt;
==== What is an atom id? ====&lt;br /&gt;
&lt;br /&gt;
The atom id is simply the atom name (eg. &amp;lt;code&amp;gt;'CA'&amp;lt;/code&amp;gt;). In practice,&lt;br /&gt;
the atom name is created by stripping all spaces from the atom name&lt;br /&gt;
in the PDB file. &lt;br /&gt;
&lt;br /&gt;
However, in PDB files, a space can be part of an atom name. Often,&lt;br /&gt;
calcium atoms are called &amp;lt;code&amp;gt;'CA..'&amp;lt;/code&amp;gt; in order to distinguish them&lt;br /&gt;
from C&amp;amp;alpha; atoms (which are called &amp;lt;code&amp;gt;'.CA.'&amp;lt;/code&amp;gt;). In cases&lt;br /&gt;
were stripping the spaces would create problems (ie. two atoms called&lt;br /&gt;
&amp;lt;code&amp;gt;'CA'&amp;lt;/code&amp;gt; in the same residue) the spaces are kept.&lt;br /&gt;
&lt;br /&gt;
==== How is disorder handled? ====&lt;br /&gt;
&lt;br /&gt;
This is one of the strong points of Bio.PDB. It can handle both disordered&lt;br /&gt;
atoms and point mutations (ie. a Gly and an Ala residue in the same&lt;br /&gt;
position). &lt;br /&gt;
&lt;br /&gt;
Disorder should be dealt with from two points of view: the atom and&lt;br /&gt;
the residue points of view. In general, I have tried to encapsulate&lt;br /&gt;
all the complexity that arises from disorder. If you just want to&lt;br /&gt;
loop over all C&amp;amp;alpha; atoms, you do not care that some residues&lt;br /&gt;
have a disordered side chain. On the other hand it should also be&lt;br /&gt;
possible to represent disorder completely in the data structure. Therefore,&lt;br /&gt;
disordered atoms or residues are stored in special objects that behave&lt;br /&gt;
as if there is no disorder. This is done by only representing a subset&lt;br /&gt;
of the disordered atoms or residues. Which subset is picked (e.g.&lt;br /&gt;
which of the two disordered OG side chain atom positions of a Ser&lt;br /&gt;
residue is used) can be specified by the user.&lt;br /&gt;
&lt;br /&gt;
'''Disordered atom positions''' are represented by ordinary &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;&lt;br /&gt;
objects, but all &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects that represent the same physical&lt;br /&gt;
atom are stored in a &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; object (see Fig. \ref{cap:SMCRA).&lt;br /&gt;
Each &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object in a &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; object can&lt;br /&gt;
be uniquely indexed using its altloc specifier. The &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt;&lt;br /&gt;
object forwards all uncaught method calls to the selected Atom object,&lt;br /&gt;
by default the one that represents the atom with the highest&lt;br /&gt;
occupancy. The user can of course change the selected &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt;&lt;br /&gt;
object, making use of its altloc specifier. In this way atom disorder&lt;br /&gt;
is represented correctly without much additional complexity. In other&lt;br /&gt;
words, if you are not interested in atom disorder, you will not be&lt;br /&gt;
bothered by it.&lt;br /&gt;
&lt;br /&gt;
Each disordered atom has a characteristic altloc identifier. You can&lt;br /&gt;
specify that a &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; object should behave like&lt;br /&gt;
the &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object associated with a specific altloc identifier:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
atom.disordered_select('A')  # select altloc A atom&lt;br /&gt;
atom.disordered_select('B')  # select altloc B atom &lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
A special case arises when disorder is due to '''point mutations''',&lt;br /&gt;
i.e. when two or more point mutants of a polypeptide are present in&lt;br /&gt;
the crystal. An example of this can be found in PDB structure 1EN2.&lt;br /&gt;
&lt;br /&gt;
Since these residues belong to a different residue type (e.g. let's&lt;br /&gt;
say Ser 60 and Cys 60) they should not be stored in a single &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt;&lt;br /&gt;
object as in the common case. In this case, each residue is represented&lt;br /&gt;
by one &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object, and both &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; objects&lt;br /&gt;
are stored in a single &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt; object (see Fig.&lt;br /&gt;
\ref{cap:SMCRA).&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt; object forwards all uncaught&lt;br /&gt;
methods to the selected &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object (by default the last&lt;br /&gt;
&amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object added), and thus behaves like an ordinary&lt;br /&gt;
residue. Each &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object in a &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt;&lt;br /&gt;
object can be uniquely identified by its residue name. In the above&lt;br /&gt;
example, residue Ser 60 would have id 'SER' in the &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt;&lt;br /&gt;
object, while residue Cys 60 would have id 'CYS'. The user can select&lt;br /&gt;
the active &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object in a &amp;lt;code&amp;gt;DisorderedResidue&amp;lt;/code&amp;gt;&lt;br /&gt;
object via this id.&lt;br /&gt;
&lt;br /&gt;
Example: suppose that a chain has a point mutation at position 10,&lt;br /&gt;
consisting of a Ser and a Cys residue. Make sure that residue 10 of&lt;br /&gt;
this chain behaves as the Cys residue.&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
residue = chain[10]&lt;br /&gt;
residue.disordered_select('CYS')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
In addition, you can get a list of all &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects (ie.&lt;br /&gt;
all &amp;lt;code&amp;gt;DisorderedAtom&amp;lt;/code&amp;gt; objects are 'unpacked' to their individual&lt;br /&gt;
&amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects) using the &amp;lt;code&amp;gt;get_unpacked_list&amp;lt;/code&amp;gt; method&lt;br /&gt;
of a (&amp;lt;code&amp;gt;Disordered&amp;lt;/code&amp;gt;)&amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object.&lt;br /&gt;
&lt;br /&gt;
==== Can I sort residues in a chain somehow? ====&lt;br /&gt;
&lt;br /&gt;
Yes, kinda, but I'm waiting for a request for this feature to finish&lt;br /&gt;
it :-).&lt;br /&gt;
&lt;br /&gt;
==== How are ligands and solvent handled? ====&lt;br /&gt;
&lt;br /&gt;
See 'What is a residue id?'.&lt;br /&gt;
&lt;br /&gt;
==== What about B factors? ====&lt;br /&gt;
&lt;br /&gt;
Well, yes! Bio.PDB supports isotropic and anisotropic B factors, and&lt;br /&gt;
also deals with standard deviations of anisotropic B factor if present&lt;br /&gt;
(see the section [[#Analysis]]).&lt;br /&gt;
&lt;br /&gt;
==== What about standard deviation of atomic positions? ====&lt;br /&gt;
&lt;br /&gt;
Yup, supported. See the section [[#Analysis]].&lt;br /&gt;
&lt;br /&gt;
==== I think the SMCRA data structure is not flexible/sexy/whatever enough... ====&lt;br /&gt;
&lt;br /&gt;
Sure, sure. Everybody is always coming up with (mostly vaporware or&lt;br /&gt;
partly implemented) data structures that handle all possible situations&lt;br /&gt;
and are extensible in all thinkable (and unthinkable) ways. The prosaic&lt;br /&gt;
truth however is that 99.9\% of people using (and I mean really using!)&lt;br /&gt;
crystal structures think in terms of models, chains, residues and&lt;br /&gt;
atoms. The philosophy of Bio.PDB is to provide a reasonably fast,&lt;br /&gt;
clean, simple, but complete data structure to access structure data.&lt;br /&gt;
The proof of the pudding is in the eating.&lt;br /&gt;
&lt;br /&gt;
Moreover, it is quite easy to build more specialised data structures&lt;br /&gt;
on top of the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; class (eg. there's a &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt;&lt;br /&gt;
class). On the other hand, the &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object is built&lt;br /&gt;
using a Parser/Consumer approach (called &amp;lt;code&amp;gt;PDBParser&amp;lt;/code&amp;gt;/&amp;lt;code&amp;gt;MMCIFParser&amp;lt;/code&amp;gt;&lt;br /&gt;
and &amp;lt;code&amp;gt;StructureBuilder&amp;lt;/code&amp;gt;, respectively). One can easily reuse&lt;br /&gt;
the PDB/mmCIF parsers by implementing a specialised &amp;lt;code&amp;gt;StructureBuilder&amp;lt;/code&amp;gt;&lt;br /&gt;
class. It is of course also trivial to add support for new file formats&lt;br /&gt;
by writing new parsers.&lt;br /&gt;
&lt;br /&gt;
=== Analysis ===&lt;br /&gt;
&lt;br /&gt;
==== How do I extract information from an &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object? ====&lt;br /&gt;
&lt;br /&gt;
Using the following methods:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
a.get_name()           # atom name (spaces stripped, e.g. 'CA')&lt;br /&gt;
a.get_id()             # id (equals atom name)&lt;br /&gt;
a.get_coord()          # atomic coordinates&lt;br /&gt;
a.get_vector()         # atomic coordinates as Vector object&lt;br /&gt;
a.get_bfactor()        # isotropic B factor&lt;br /&gt;
a.get_occupancy()      # occupancy&lt;br /&gt;
a.get_altloc()         # alternative location specifier&lt;br /&gt;
a.get_sigatm()         # std. dev. of atomic parameters&lt;br /&gt;
a.get_siguij()         # std. dev. of anisotropic B factor&lt;br /&gt;
a.get_anisou()         # anisotropic B factor&lt;br /&gt;
a.get_fullname()       # atom name (with spaces, e.g. '.CA.')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I extract information from a &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object? ====&lt;br /&gt;
&lt;br /&gt;
Using the following methods:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
r.get_resname()         # return the residue name (eg. 'GLY')&lt;br /&gt;
r.is_disordered()       # 1 if the residue has disordered atoms&lt;br /&gt;
r.get_segid()           # return the SEGID&lt;br /&gt;
r.has_id(name)          # test if a residue has a certain atom&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I measure distances? ====&lt;br /&gt;
&lt;br /&gt;
That's simple: the minus operator for atoms has been overloaded to&lt;br /&gt;
return the distance between two atoms. &lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Get some atoms&lt;br /&gt;
ca1 = residue1['CA']&lt;br /&gt;
ca2 = residue2['CA']&lt;br /&gt;
# Simply subtract the atoms to get their distance&lt;br /&gt;
distance = ca1-ca2&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I measure angles? ====&lt;br /&gt;
&lt;br /&gt;
This can easily be done via the vector representation of the atomic coordinates, and the &amp;lt;code&amp;gt;calc_angle&amp;lt;/code&amp;gt; function from the &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; module:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
vector1 = atom1.get_vector()&lt;br /&gt;
vector2 = atom2.get_vector()&lt;br /&gt;
vector3 = atom3.get_vector()&lt;br /&gt;
angle = calc_angle(vector1, vector2, vector3)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I measure torsion angles? ====&lt;br /&gt;
&lt;br /&gt;
Again, this can easily be done via the vector representation of the&lt;br /&gt;
atomic coordinates, this time using the &amp;lt;code&amp;gt;calc_dihedral&amp;lt;/code&amp;gt; function&lt;br /&gt;
from the &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; module:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
vector1=atom1.get_vector()&lt;br /&gt;
vector2=atom2.get_vector()&lt;br /&gt;
vector3=atom3.get_vector()&lt;br /&gt;
vector4=atom4.get_vector()&lt;br /&gt;
angle=calc_dihedral(vector1, vector2, vector3, vector4)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I determine atom-atom contacts? ====&lt;br /&gt;
&lt;br /&gt;
Use &amp;lt;code&amp;gt;NeighborSearch&amp;lt;/code&amp;gt;. This uses a KD tree data structure coded&lt;br /&gt;
in C behind the screens, so it's pretty darn fast (see &amp;lt;code&amp;gt;Bio.KDTree&amp;lt;/code&amp;gt;).&lt;br /&gt;
&lt;br /&gt;
==== How do I extract polypeptides from a &amp;lt;code&amp;gt;Structure&amp;lt;/code&amp;gt; object? ====&lt;br /&gt;
&lt;br /&gt;
Use &amp;lt;code&amp;gt;PolypeptideBuilder&amp;lt;/code&amp;gt;. You can use the resulting &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt;&lt;br /&gt;
object to get the sequence as a &amp;lt;code&amp;gt;Seq&amp;lt;/code&amp;gt; object or to get a list&lt;br /&gt;
of C&amp;amp;alpha; atoms as well. Polypeptides can be built using a C-N&lt;br /&gt;
or a C&amp;amp;alpha;-C&amp;amp;alpha; distance criterion.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# Using C-N &lt;br /&gt;
ppb=PPBuilder()&lt;br /&gt;
for pp in ppb.build_peptides(structure): &lt;br /&gt;
    print pp.get_sequence()&lt;br /&gt;
&lt;br /&gt;
# Using CA-CA&lt;br /&gt;
ppb=CaPPBuilder()&lt;br /&gt;
for pp in ppb.build_peptides(structure): &lt;br /&gt;
    print pp.get_sequence()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that in the above case only model 0 of the structure is considered&lt;br /&gt;
by &amp;lt;code&amp;gt;PolypeptideBuilder&amp;lt;/code&amp;gt;. However, it is possible to use &amp;lt;code&amp;gt;PolypeptideBuilder&amp;lt;/code&amp;gt;&lt;br /&gt;
to build &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt; objects from &amp;lt;code&amp;gt;Model&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;Chain&amp;lt;/code&amp;gt;&lt;br /&gt;
objects as well.&lt;br /&gt;
&lt;br /&gt;
==== How do I get the sequence of a structure? ====&lt;br /&gt;
&lt;br /&gt;
The first thing to do is to extract all polypeptides from the structure&lt;br /&gt;
(see previous entry). The sequence of each polypeptide can then easily&lt;br /&gt;
be obtained from the &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt; objects. The sequence is&lt;br /&gt;
represented as a Biopython &amp;lt;code&amp;gt;Seq&amp;lt;/code&amp;gt; object, and its alphabet is&lt;br /&gt;
defined by a &amp;lt;code&amp;gt;ProteinAlphabet&amp;lt;/code&amp;gt; object.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; seq = polypeptide.get_sequence()&lt;br /&gt;
&amp;gt;&amp;gt;&amp;gt; print seq&lt;br /&gt;
Seq('SNVVE...', &amp;lt;class Bio.Alphabet.ProteinAlphabet&amp;gt;)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I determine secondary structure? ====&lt;br /&gt;
&lt;br /&gt;
For this functionality, you need to install DSSP (and obtain a license&lt;br /&gt;
for it - free for academic use, see http://www.cmbi.kun.nl/gv/dssp/).&lt;br /&gt;
Then use the &amp;lt;code&amp;gt;DSSP&amp;lt;/code&amp;gt; class, which maps &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; objects&lt;br /&gt;
to their secondary structure (and accessible surface area). The DSSP&lt;br /&gt;
codes are listed in Table \ref{cap:DSSP-codes. Note that DSSP (the&lt;br /&gt;
program, and thus by consequence the class) cannot handle multiple&lt;br /&gt;
models!&lt;br /&gt;
&lt;br /&gt;
%&lt;br /&gt;
\begin{table&lt;br /&gt;
&lt;br /&gt;
==== \begin{tabular{|c|c|&lt;br /&gt;
\hline &lt;br /&gt;
Code&amp;amp;&lt;br /&gt;
Secondary structure\tabularnewline&lt;br /&gt;
\hline&lt;br /&gt;
\hline &lt;br /&gt;
H&amp;amp;&lt;br /&gt;
&amp;amp;alpha;-helix\tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
B&amp;amp;&lt;br /&gt;
Isolated &amp;amp;beta;-bridge residue\tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
E&amp;amp;&lt;br /&gt;
Strand \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
G&amp;amp;&lt;br /&gt;
3-10 helix \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
I&amp;amp;&lt;br /&gt;
$\Pi$-helix \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
T&amp;amp;&lt;br /&gt;
Turn\tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
S&amp;amp;&lt;br /&gt;
Bend \tabularnewline&lt;br /&gt;
\hline &lt;br /&gt;
-&amp;amp;&lt;br /&gt;
Other\tabularnewline&lt;br /&gt;
\hline&lt;br /&gt;
\end{tabular&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
\caption{\label{cap:DSSP-codesDSSP codes in Bio.PDB.&lt;br /&gt;
\end{table&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
==== How do I calculate the accessible surface area of a residue? ====&lt;br /&gt;
&lt;br /&gt;
Use the &amp;lt;code&amp;gt;DSSP&amp;lt;/code&amp;gt; class (see also previous entry). But see also&lt;br /&gt;
next entry.&lt;br /&gt;
&lt;br /&gt;
==== How do I calculate residue depth? ====&lt;br /&gt;
&lt;br /&gt;
Residue depth is the average distance of a residue's atoms from the&lt;br /&gt;
solvent accessible surface. It's a fairly new and very powerful parameterization&lt;br /&gt;
of solvent accessibility. For this functionality, you need to install&lt;br /&gt;
Michel Sanner's MSMS program (http://www.scripps.edu/pub/olson-web/people/sanner/html/msms_home.html).&lt;br /&gt;
Then use the &amp;lt;code&amp;gt;ResidueDepth&amp;lt;/code&amp;gt; class. This class behaves as a&lt;br /&gt;
dictionary which maps &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; objects to corresponding (residue&lt;br /&gt;
depth, C&amp;amp;alpha; depth) tuples. The C&amp;amp;alpha; depth is the distance&lt;br /&gt;
of a residue's C&amp;amp;alpha; atom to the solvent accessible surface. &lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
model = structure[0]&lt;br /&gt;
rd = ResidueDepth(model, pdb_file)&lt;br /&gt;
residue_depth, ca_depth = rd[some_residue]&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
You can also get access to the molecular surface itself (via the &amp;lt;code&amp;gt;get_surface&amp;lt;/code&amp;gt;&lt;br /&gt;
function), in the form of a Numeric python array with the surface points.&lt;br /&gt;
&lt;br /&gt;
==== How do I calculate Half Sphere Exposure? ====&lt;br /&gt;
&lt;br /&gt;
Half Sphere Exposure (HSE) is a new, 2D measure of solvent exposure.&lt;br /&gt;
Basically, it counts the number of C&amp;amp;alpha; atoms around a residue&lt;br /&gt;
in the direction of its side chain, and in the opposite direction&lt;br /&gt;
(within a radius of 13 Å). Despite its simplicity, it outperforms&lt;br /&gt;
many other measures of solvent exposure. An article describing this&lt;br /&gt;
novel 2D measure has been submitted.&lt;br /&gt;
&lt;br /&gt;
HSE comes in two flavors: HSE&amp;amp;alpha; and HSE&amp;amp;beta;. The former&lt;br /&gt;
only uses the C&amp;amp;alpha; atom positions, while the latter uses the&lt;br /&gt;
C&amp;amp;alpha; and C&amp;amp;beta; atom positions. The HSE measure is calculated&lt;br /&gt;
by the &amp;lt;code&amp;gt;HSExposure&amp;lt;/code&amp;gt; class, which can also calculate the contact&lt;br /&gt;
number. The latter class has methods which return dictionaries that&lt;br /&gt;
map a &amp;lt;code&amp;gt;Residue&amp;lt;/code&amp;gt; object to its corresponding HSE&amp;amp;alpha;, HSE&amp;amp;beta;&lt;br /&gt;
and contact number values.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
model = structure[0]&lt;br /&gt;
hse = HSExposure()&lt;br /&gt;
# Calculate HSEalpha&lt;br /&gt;
exp_ca = hse.calc_hs_exposure(model, option='CA3')&lt;br /&gt;
# Calculate HSEbeta&lt;br /&gt;
exp_cb = hse.calc_hs_exposure(model, option='CB')&lt;br /&gt;
# Calculate classical coordination number&lt;br /&gt;
exp_fs = hse.calc_fs_exposure(model)&lt;br /&gt;
# Print HSEalpha for a residue&lt;br /&gt;
print exp_ca[some_residue]&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I map the residues of two related structures onto each other? ====&lt;br /&gt;
&lt;br /&gt;
First, create an alignment file in FASTA format, then use the &amp;lt;code&amp;gt;StructureAlignment&amp;lt;/code&amp;gt;&lt;br /&gt;
class. This class can also be used for alignments with more than two&lt;br /&gt;
structures.&lt;br /&gt;
&lt;br /&gt;
==== How do I test if a Residue object is an amino acid? ====&lt;br /&gt;
&lt;br /&gt;
Use &amp;lt;code&amp;gt;is_aa(residue)&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
==== Can I do vector operations on atomic coordinates? ====&lt;br /&gt;
&lt;br /&gt;
&amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; objects return a &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; object representation&lt;br /&gt;
of the coordinates with the &amp;lt;code&amp;gt;get_vector&amp;lt;/code&amp;gt; method. &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt;&lt;br /&gt;
implements the full set of 3D vector operations, matrix multiplication&lt;br /&gt;
(left and right) and some advanced rotation-related operations as&lt;br /&gt;
well. See also next question.&lt;br /&gt;
&lt;br /&gt;
==== How do I put a virtual C&amp;amp;beta; on a Gly residue? ====&lt;br /&gt;
&lt;br /&gt;
OK, I admit, this example is only present to show off the possibilities&lt;br /&gt;
of Bio.PDB's &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt; module (though this code is actually&lt;br /&gt;
used in the &amp;lt;code&amp;gt;HSExposure&amp;lt;/code&amp;gt; module, which contains a novel way&lt;br /&gt;
to parametrize residue exposure - publication underway). Suppose that&lt;br /&gt;
you would like to find the position of a Gly residue's C&amp;amp;beta; atom,&lt;br /&gt;
if it had one. How would you do that? Well, rotating the N atom of&lt;br /&gt;
the Gly residue along the C&amp;amp;alpha;-C bond over -120 degrees roughly&lt;br /&gt;
puts it in the position of a virtual C&amp;amp;beta; atom. Here's how to&lt;br /&gt;
do it, making use of the &amp;lt;code&amp;gt;rotaxis&amp;lt;/code&amp;gt; method (which can be used&lt;br /&gt;
to construct a rotation around a certain axis) of the &amp;lt;code&amp;gt;Vector&amp;lt;/code&amp;gt;&lt;br /&gt;
module:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
# get atom coordinates as vectors&lt;br /&gt;
n = residue['N'].get_vector() &lt;br /&gt;
c = residue['C'].get_vector() &lt;br /&gt;
ca = residue['CA'].get_vector()&lt;br /&gt;
# center at origin&lt;br /&gt;
n = n - ca&lt;br /&gt;
c = c - ca&lt;br /&gt;
# find rotation matrix that rotates n -120 degrees along the ca-c vector&lt;br /&gt;
rot = rotaxis(-pi{*120.0/180.0, c)&lt;br /&gt;
# apply rotation to ca-n vector&lt;br /&gt;
cb_at_origin = n.left_multiply(rot)&lt;br /&gt;
# put on top of ca atom&lt;br /&gt;
cb = cb_at_origin + ca&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
This example shows that it's possible to do some quite nontrivial&lt;br /&gt;
vector operations on atomic data, which can be quite useful. In addition&lt;br /&gt;
to all the usual vector operations (cross (use &amp;lt;code&amp;gt;**&amp;lt;/code&amp;gt;), and&lt;br /&gt;
dot (use &amp;lt;code&amp;gt;*&amp;lt;/code&amp;gt;) product, angle, norm, etc.) and the above mentioned&lt;br /&gt;
&amp;lt;code&amp;gt;rotaxis function, the &amp;lt;code&amp;gt;Vector module also has methods&lt;br /&gt;
to rotate (&amp;lt;code&amp;gt;rotmat&amp;lt;/code&amp;gt;) or reflect (&amp;lt;code&amp;gt;refmat&amp;lt;/code&amp;gt;) one vector&lt;br /&gt;
on top of another.&lt;br /&gt;
&lt;br /&gt;
=== Manipulating the structure ===&lt;br /&gt;
&lt;br /&gt;
==== How do I superimpose two structures? ====&lt;br /&gt;
&lt;br /&gt;
Surprisingly, this is done using the &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; object.&lt;br /&gt;
This object calculates the rotation and translation matrix that rotates&lt;br /&gt;
two lists of atoms on top of each other in such a way that their RMSD&lt;br /&gt;
is minimized. Of course, the two lists need to contain the same amount&lt;br /&gt;
of atoms. The &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; object can also apply the rotation/translation&lt;br /&gt;
to a list of atoms. The rotation and translation are stored as a tuple&lt;br /&gt;
in the &amp;lt;code&amp;gt;rotran&amp;lt;/code&amp;gt; attribute of the &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; object&lt;br /&gt;
(note that the rotation is right multiplying!). The RMSD is stored&lt;br /&gt;
in the &amp;lt;code&amp;gt;rmsd&amp;lt;/code&amp;gt; attribute.&lt;br /&gt;
&lt;br /&gt;
The algorithm used by &amp;lt;code&amp;gt;Superimposer&amp;lt;/code&amp;gt; comes from \textit{Matrix&lt;br /&gt;
computations, 2nd ed. Golub, G. \&amp;amp; Van Loan (1989) and makes use&lt;br /&gt;
of singular value decomposition (this is implemented in the general&lt;br /&gt;
&amp;lt;code&amp;gt;Bio.SVDSuperimposer&amp;lt;/code&amp;gt; module).&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
sup=Superimposer()&lt;br /&gt;
 # Specify the atom lists&lt;br /&gt;
 # 'fixed' and 'moving' are lists of Atom objects&lt;br /&gt;
 # The moving atoms will be put on the fixed atoms&lt;br /&gt;
sup.set_atoms(fixed, moving)&lt;br /&gt;
 # Print rotation/translation/rmsd&lt;br /&gt;
print sup.rotran&lt;br /&gt;
print sup.rms &lt;br /&gt;
 # Apply rotation/translation to the moving atoms&lt;br /&gt;
sup.apply(moving)&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I superimpose two structures based on their active sites? ====&lt;br /&gt;
&lt;br /&gt;
Pretty easily. Use the active site atoms to calculate the rotation/translation&lt;br /&gt;
matrices (see above), and apply these to the whole molecule.&lt;br /&gt;
&lt;br /&gt;
==== Can I manipulate the atomic coordinates? ====&lt;br /&gt;
&lt;br /&gt;
Yes, using the &amp;lt;code&amp;gt;transform&amp;lt;/code&amp;gt; method of the &amp;lt;code&amp;gt;Atom&amp;lt;/code&amp;gt; object,&lt;br /&gt;
or directly using the &amp;lt;code&amp;gt;set_coord&amp;lt;/code&amp;gt; method.&lt;br /&gt;
&lt;br /&gt;
== Other Structural Bioinformatics modules ==&lt;br /&gt;
&lt;br /&gt;
==== Bio.SCOP ====&lt;br /&gt;
&lt;br /&gt;
See the main Biopython tutorial.&lt;br /&gt;
&lt;br /&gt;
==== Bio.FSSP ====&lt;br /&gt;
&lt;br /&gt;
No documentation available yet.&lt;br /&gt;
&lt;br /&gt;
== You haven't answered my question yet! ==&lt;br /&gt;
&lt;br /&gt;
Woah! It's late and I'm tired, and a glass of excellent ''Pedro&lt;br /&gt;
Ximenez'' sherry is waiting for me. Just drop me a mail, and I'll answer&lt;br /&gt;
you in the morning (with a bit of luck...).&lt;br /&gt;
&lt;br /&gt;
== Contributors ==&lt;br /&gt;
&lt;br /&gt;
The main author/maintainer of Bio.PDB is:&lt;br /&gt;
&lt;br /&gt;
 Thomas Hamelryck&lt;br /&gt;
 Bioinformatics center&lt;br /&gt;
 Institute of Molecular Biology&lt;br /&gt;
 University of Copenhagen&lt;br /&gt;
 Universitetsparken 15, Bygning 10&lt;br /&gt;
 DK-2100 København Ø&lt;br /&gt;
 Denmark&lt;br /&gt;
 thamelry@binf.ku.dk&lt;br /&gt;
&lt;br /&gt;
Kristian Rother donated code to interact with the PDB database, and to parse the PDB&lt;br /&gt;
header. Indraneel Majumdar sent in some bug reports and assisted in&lt;br /&gt;
coding the &amp;lt;code&amp;gt;Polypeptide&amp;lt;/code&amp;gt; module. Many thanks to Brad Chapman,&lt;br /&gt;
Jeffrey Chang, Andrew Dalke and Iddo Friedberg for suggestions, comments,&lt;br /&gt;
help and/or biting criticism :-).&lt;br /&gt;
&lt;br /&gt;
== Can I contribute? ==&lt;br /&gt;
&lt;br /&gt;
Yes, yes, yes! Just send me an e-mail (thamelry@binf.ku.dk) or to the Biopython developers (biopython-dev@biopython.org) if you&lt;br /&gt;
have something useful to contribute! Eternal fame awaits!&lt;/div&gt;</summary>
		<author><name>Mdehoon</name></author>	</entry>

	<entry>
		<id>http://www.biopython.org/wiki/The_Biopython_Structural_Bioinformatics_FAQ</id>
		<title>The Biopython Structural Bioinformatics FAQ</title>
		<link rel="alternate" type="text/html" href="http://www.biopython.org/wiki/The_Biopython_Structural_Bioinformatics_FAQ"/>
				<updated>2013-03-26T03:33:25Z</updated>
		
		<summary type="html">&lt;p&gt;Mdehoon: /* How do I calculate residue depth? */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Introduction ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB is a biopython module that focuses on working with crystal&lt;br /&gt;
structures of biological macromolecules. This document gives a fairly&lt;br /&gt;
complete overview of Bio.PDB.&lt;br /&gt;
&lt;br /&gt;
== Bio.PDB's installation ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB is automatically installed as part of Biopython. Biopython&lt;br /&gt;
can be obtained from http://www.biopython.org. It runs on many&lt;br /&gt;
platforms (Linux/Unix, windows, Mac,...).&lt;br /&gt;
&lt;br /&gt;
== Who's using Bio.PDB? ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB was used in the construction of DISEMBL, a web server that&lt;br /&gt;
predicts disordered regions in proteins (http://dis.embl.de/),&lt;br /&gt;
and COLUMBA, a website that provides annotated protein structures&lt;br /&gt;
(http://www.columba-db.de/). Bio.PDB has also been used to&lt;br /&gt;
perform a large scale search for active sites similarities between&lt;br /&gt;
protein structures in the PDB (see [http://dx.doi.org/10.1002/prot.10338 \textit{Proteins Struct. Func.&lt;br /&gt;
Gen., \textbf{2003, 51, 96-108]) , and to develop a new algorithm&lt;br /&gt;
that identifies linear secondary structure elements ([\emph{BMC Bioinformatics,&lt;br /&gt;
\textbf{2005, 6, 202 http://www.biomedcentral.com/1471-2105/6/202]).&lt;br /&gt;
&lt;br /&gt;
Judging from requests for features and information, Bio.PDB is also&lt;br /&gt;
used by several LPCs (Large Pharmaceutical Companies :-).&lt;br /&gt;
&lt;br /&gt;
== Is there a Bio.PDB reference? ==&lt;br /&gt;
&lt;br /&gt;
Yes, and I'd appreciate it if you would refer to Bio.PDB in publications&lt;br /&gt;
if you make use of it. The reference is:&lt;br /&gt;
&lt;br /&gt;
\begin{quote&lt;br /&gt;
Hamelryck, T., Manderick, B. (2003) PDB parser and structure class&lt;br /&gt;
implemented in Python. \textit{Bioinformatics, \textbf{19, 2308-2310. &lt;br /&gt;
\end{quote&lt;br /&gt;
The article can be freely downloaded via the Bioinformatics journal&lt;br /&gt;
website (http://www.binf.ku.dk/users/thamelry/references.html).&lt;br /&gt;
I welcome e-mails telling me what you are using Bio.PDB for. Feature&lt;br /&gt;
requests are welcome too.&lt;br /&gt;
&lt;br /&gt;
== How well tested is Bio.PDB? ==&lt;br /&gt;
&lt;br /&gt;
Pretty well, actually. Bio.PDB has been extensively tested on nearly&lt;br /&gt;
5500 structures from the PDB - all structures seemed to be parsed&lt;br /&gt;
correctly. More details can be found in the Bio.PDB Bioinformatics&lt;br /&gt;
article. Bio.PDB has been used/is being used in many research projects&lt;br /&gt;
as a reliable tool. In fact, I'm using Bio.PDB almost daily for research&lt;br /&gt;
purposes and continue working on improving it and adding new features.&lt;br /&gt;
&lt;br /&gt;
== How fast is it? ==&lt;br /&gt;
&lt;br /&gt;
The &amp;lt;code&amp;gt;PDBParser&amp;lt;/code&amp;gt; performance was tested on about 800 structures&lt;br /&gt;
(each belonging to a unique SCOP superfamily). This takes about 20&lt;br /&gt;
minutes, or on average 1.5 seconds per structure. Parsing the structure&lt;br /&gt;
of the large ribosomal subunit (1FKK), which contains about 64000&lt;br /&gt;
atoms, takes 10 seconds on a 1000 MHz PC. In short: it's more than&lt;br /&gt;
fast enough for many applications.&lt;br /&gt;
&lt;br /&gt;
== Why should I use Bio.PDB? ==&lt;br /&gt;
&lt;br /&gt;
Bio.PDB might be exactly what you want, and then again it might not.&lt;br /&gt;
If you are interested in data mining the PDB header, you might want&lt;br /&gt;
to look elsewhere because there is only limited support for this.&lt;br /&gt;
If you look for a powerful, complete data structure to access the&lt;br /&gt;
atomic data Bio.PDB is probably for you. &lt;br /&gt;
&lt;br /&gt;
== Usage ==&lt;br /&gt;
&lt;br /&gt;
=== General questions ===&lt;br /&gt;
&lt;br /&gt;
==== Importing Bio.PDB ====&lt;br /&gt;
&lt;br /&gt;
That's simple:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
from Bio.PDB import *&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Is there support for molecular graphics? ====&lt;br /&gt;
&lt;br /&gt;
Not directly, mostly since there are quite a few Python based/Python&lt;br /&gt;
aware solutions already, that can potentially be used with Bio.PDB.&lt;br /&gt;
My choice is Pymol, BTW (I've used this successfully with Bio.PDB,&lt;br /&gt;
and there will probably be specific PyMol modules in Bio.PDB soon/some&lt;br /&gt;
day). Python based/aware molecular graphics solutions include:&lt;br /&gt;
&lt;br /&gt;
* PyMol: http://pymol.sourceforge.net/&lt;br /&gt;
* Chimera: http://www.cgl.ucsf.edu/chimera/&lt;br /&gt;
* PMV: http://www.scripps.edu/ sanner/python/&lt;br /&gt;
* Coot: http://www.ysbl.york.ac.uk/ emsley/coot/&lt;br /&gt;
* CCP4mg: http://www.ysbl.york.ac.uk/ lizp/molgraphics.html&lt;br /&gt;
* mmLib: http://pymmlib.sourceforge.net/ &lt;br /&gt;
* VMD: http://www.ks.uiuc.edu/Research/vmd/&lt;br /&gt;
* MMTK: http://starship.python.net/crew/hinsen/MMTK/&lt;br /&gt;
&lt;br /&gt;
I'd be crazy to write another molecular graphics application (been&lt;br /&gt;
there - done that, actually :-).&lt;br /&gt;
&lt;br /&gt;
=== Input/output ===&lt;br /&gt;
&lt;br /&gt;
==== How do I create a structure object from a PDB file? ====&lt;br /&gt;
&lt;br /&gt;
First, create a &amp;lt;code&amp;gt;PDBParser&amp;lt;/code&amp;gt; object:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
parser = PDBParser()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Then, create a structure object from a PDB file in the following way&lt;br /&gt;
(the PDB file in this case is called '1FAT.pdb', 'PHA-L' is a user&lt;br /&gt;
defined name for the structure):&lt;br /&gt;
&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
structure = parser.get_structure('PHA-L', '1FAT.pdb')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== How do I create a structure object from an mmCIF file? ====&lt;br /&gt;
&lt;br /&gt;
Similarly to the case the case of PDB files, first create an &amp;lt;code&amp;gt;MMCIFParser&amp;lt;/code&amp;gt;&lt;br /&gt;
object:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
parser = MMCIFParser()&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
Then use this parser to create a structure object from the mmCIF file:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
structure = parser.get_structure('PHA-L', '1FAT.cif')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== ...and what about the new PDB XML format? ====&lt;br /&gt;
&lt;br /&gt;
That's not yet supported, but I'm definitely planning to support that&lt;br /&gt;
in the future (it's not a lot of work). Contact me if you need this,&lt;br /&gt;
it might encourage me :-).&lt;br /&gt;
&lt;br /&gt;
==== I'd like to have some more low level access to an mmCIF file... ====&lt;br /&gt;
&lt;br /&gt;
You got it. You can create a python dictionary that maps all mmCIF&lt;br /&gt;
tags in an mmCIF file to their values. If there are multiple values&lt;br /&gt;
(like in the case of tag &amp;lt;code&amp;gt;_atom_site.Cartn_y&amp;lt;/code&amp;gt;, which holds&lt;br /&gt;
the ''y'' coordinates of all atoms), the tag is mapped to a list of values.&lt;br /&gt;
The dictionary is created from the mmCIF file as follows:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
mmcif_dict = MMCIF2Dict('1FAT.cif')&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example: get the solvent content from an mmCIF file:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
sc = mmcif_dict['_exptl_crystal.density_percent_sol']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Example: get the list of the ''y'' coordinates of all atoms&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
y_list = mmcif_dict['_atom_site.Cartn_y']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
==== Can I access the header information? ====&lt;br /&gt;
&lt;br /&gt;
Thanks to Christian Rother you can access some information from the&lt;br /&gt;
PDB header. Note however that many PDB files contain headers with&lt;br /&gt;
incomplete or erroneous information. Many of the errors have been&lt;br /&gt;
fixed in the equivalent mmCIF files.&lt;br /&gt;
'''Hence, if you are interested in the header information, it is a good idea to extract information from mmCIF files using the &amp;lt;code&amp;gt;MMCIF2Dict&amp;lt;/code&amp;gt; tool described above, instead of parsing the PDB header.''' &lt;br /&gt;
&lt;br /&gt;
Now that is clarified, let's return to parsing the PDB header. The&lt;br /&gt;
structure object has an attribute called &amp;lt;code&amp;gt;header&amp;lt;/code&amp;gt; which is&lt;br /&gt;
a python dictionary that maps header records to their values.&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&amp;lt;python&amp;gt;&lt;br /&gt;
resolution = structure.header['resolution']&lt;br /&gt;
keywords = structure.header['keywords']&lt;br /&gt;
&amp;lt;/python&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The available keys are &amp;lt;code&amp;gt;name&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;head&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;deposition_date&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;release_date&amp;lt;/code&amp;gt;,&lt;br /&gt;
&amp;lt;code&amp;gt;structure_method&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;resolution&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;structure_reference&amp;lt;/code&amp;gt; (maps to&lt;br /&gt;
a list of references), &amp;lt;code&amp;gt;journal_reference&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;autho