| Trees | Indices | Help |
|
|---|
|
|
Code to work with GenBank formatted files.
Rather than using Bio.GenBank, you are now encouraged to use Bio.SeqIO with
the "genbank" or "embl" format names to parse GenBank or EMBL files into
SeqRecord and SeqFeature objects (see the Biopython tutorial for details).
Using Bio.GenBank directly to parse GenBank files is only useful if you want
to obtain GenBank-specific Record objects, which is a much closer
representation to the raw file contents that the SeqRecord alternative from
the FeatureParser (used in Bio.SeqIO).
To use the Bio.GenBank parser, there are two helper functions:
read Parse a handle containing a single GenBank record
as Bio.GenBank specific Record objects.
parse Iterate over a handle containing multiple GenBank
records as Bio.GenBank specific Record objects.
The following internal classes are not intended for direct use and may
be deprecated in a future release.
Classes:
Iterator Iterate through a file of GenBank entries
ErrorFeatureParser Catch errors caused during parsing.
FeatureParser Parse GenBank data in SeqRecord and SeqFeature objects.
RecordParser Parse GenBank data into a Record object.
Exceptions:
ParserFailureError Exception indicating a failure in the parser (ie.
scanner or consumer)
LocationParserError Exception indiciating a problem with the spark based
location parser.
|
|||
| |||
|
|||
|
Iterator Iterator interface to move over a file of GenBank entries one at a time (OBSOLETE). |
|||
|
ParserFailureError Failure caused by some kind of problem in the parser. |
|||
|
LocationParserError Could not Properly parse out a location from a GenBank file. |
|||
|
FeatureParser Parse GenBank files into Seq + Feature objects (OBSOLETE). |
|||
|
RecordParser Parse GenBank files into Record objects (OBSOLETE). |
|||
|
_BaseGenBankConsumer Abstract GenBank consumer providing useful general functions (PRIVATE). |
|||
|
_FeatureConsumer Create a SeqRecord object with Features to return (PRIVATE). |
|||
|
_RecordConsumer Create a GenBank Record object from scanner generated information (PRIVATE). |
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
GENBANK_INDENT = 12
|
|||
GENBANK_SPACER =
|
|||
FEATURE_KEY_INDENT = 5
|
|||
FEATURE_QUALIFIER_INDENT = 21
|
|||
FEATURE_KEY_SPACER =
|
|||
FEATURE_QUALIFIER_SPACER =
|
|||
_solo_location =
|
|||
_pair_location =
|
|||
_between_location =
|
|||
_within_position =
|
|||
_re_within_position = re.compile(r'\(\d
|
|||
_within_location =
|
|||
_oneof_position =
|
|||
_re_oneof_position = re.compile(r'one-of\(\d
|
|||
_oneof_location =
|
|||
_simple_location =
|
|||
_re_simple_location = re.compile(r'^\d
|
|||
_re_simple_compound = re.compile(r'^
|
|||
_complex_location =
|
|||
_re_complex_location = re.compile(r'^
|
|||
_possibly_complemented_complex_location =
|
|||
_re_complex_compound = re.compile(r'^
|
|||
|
|||
Build a Position object (PRIVATE).
For an end position, leave offset as zero (default):
>>> _pos("5")
ExactPosition(5)
For a start position, set offset to minus one (for Python counting):
>>> _pos("5", -1)
ExactPosition(4)
This also covers fuzzy positions:
>>> p = _pos("<5")
>>> p
BeforePosition(5)
>>> print p
<5
>>> int(p)
5
>>> _pos(">5")
AfterPosition(5)
By default assumes an end position, so note the integer behaviour:
>>> p = _pos("one-of(5,8,11)")
>>> p
OneOfPosition(11, choices=[ExactPosition(5), ExactPosition(8), ExactPosition(11)])
>>> print p
one-of(5,8,11)
>>> int(p)
11
>>> _pos("(8.10)")
WithinPosition(10, left=8, right=10)
Fuzzy start positions:
>>> p = _pos("<5", -1)
>>> p
BeforePosition(4)
>>> print p
<4
>>> int(p)
4
Notice how the integer behaviour changes too!
>>> p = _pos("one-of(5,8,11)", -1)
>>> p
OneOfPosition(4, choices=[ExactPosition(4), ExactPosition(7), ExactPosition(10)])
>>> print(p)
one-of(4,7,10)
>>> int(p)
4
|
FeatureLocation from non-compound non-complement location (PRIVATE).
Simple examples,
>>> _loc("123..456", 1000, +1)
FeatureLocation(ExactPosition(122), ExactPosition(456), strand=1)
>>> _loc("<123..>456", 1000, strand = -1)
FeatureLocation(BeforePosition(122), AfterPosition(456), strand=-1)
A more complex location using within positions,
>>> _loc("(9.10)..(20.25)", 1000, 1)
FeatureLocation(WithinPosition(8, left=8, right=9), WithinPosition(25, left=20, right=25), strand=1)
Notice how that will act as though it has overall start 8 and end 25.
Zero length between feature,
>>> _loc("123^124", 1000, 0)
FeatureLocation(ExactPosition(123), ExactPosition(123), strand=0)
The expected sequence length is needed for a special case, a between
position at the start/end of a circular genome:
>>> _loc("1000^1", 1000, 1)
FeatureLocation(ExactPosition(1000), ExactPosition(1000), strand=1)
Apart from this special case, between positions P^Q must have P+1==Q,
>>> _loc("123^456", 1000, 1)
Traceback (most recent call last):
...
ValueError: Invalid between location '123^456'
|
Split a tricky compound location string (PRIVATE).
>>> list(_split_compound_loc("123..145"))
['123..145']
>>> list(_split_compound_loc("123..145,200..209"))
['123..145', '200..209']
>>> list(_split_compound_loc("one-of(200,203)..300"))
['one-of(200,203)..300']
>>> list(_split_compound_loc("complement(123..145),200..209"))
['complement(123..145)', '200..209']
>>> list(_split_compound_loc("123..145,one-of(200,203)..209"))
['123..145', 'one-of(200,203)..209']
>>> list(_split_compound_loc("123..145,one-of(200,203)..one-of(209,211),300"))
['123..145', 'one-of(200,203)..one-of(209,211)', '300']
>>> list(_split_compound_loc("123..145,complement(one-of(200,203)..one-of(209,211)),300"))
['123..145', 'complement(one-of(200,203)..one-of(209,211))', '300']
>>> list(_split_compound_loc("123..145,200..one-of(209,211),300"))
['123..145', '200..one-of(209,211)', '300']
>>> list(_split_compound_loc("123..145,200..one-of(209,211)"))
['123..145', '200..one-of(209,211)']
>>> list(_split_compound_loc("complement(149815..150200),complement(293787..295573),NC_016402.1:6618..6676,181647..181905"))
['complement(149815..150200)', 'complement(293787..295573)', 'NC_016402.1:6618..6676', '181647..181905']
|
Iterate over GenBank formatted entries as Record objects.
>>> from Bio import GenBank
>>> handle = open("GenBank/NC_000932.gb")
>>> for record in GenBank.parse(handle):
... print record.accession
['NC_000932']
>>> handle.close()
To get SeqRecord objects use Bio.SeqIO.parse(..., format="gb")
instead.
|
Read a handle containing a single GenBank entry as a Record object.
>>> from Bio import GenBank
>>> handle = open("GenBank/NC_000932.gb")
>>> record = GenBank.read(handle)
>>> print record.accession
['NC_000932']
>>> handle.close()
To get a SeqRecord object use Bio.SeqIO.read(..., format="gb")
instead.
|
|
|||
_within_location
|
_oneof_location
|
_re_simple_compound
|
_complex_location
|
_re_complex_location
|
_possibly_complemented_complex_location
|
_re_complex_compound
|
| Trees | Indices | Help |
|
|---|
| Generated by Epydoc 3.0.1 on Tue Feb 5 17:59:45 2013 | http://epydoc.sourceforge.net |