| Trees | Indices | Help |
|
|---|
|
|
object --+
|
SeqFeature
Represent a Sequence Feature on an object. Attributes: o location - the location of the feature on the sequence (FeatureLocation) o type - the specified type of the feature (ie. CDS, exon, repeat...) o location_operator - a string specifying how this SeqFeature may be related to others. For example, in the example GenBank feature shown below, the location_operator would be "join" o strand - A value specifying on which strand (of a DNA sequence, for instance) the feature deals with. 1 indicates the plus strand, -1 indicates the minus strand, 0 indicates stranded but unknown (? in GFF3), while the default of None indicates that strand doesn't apply (dot in GFF3, e.g. features on proteins). Note this is a shortcut for accessing the strand property of the feature's location. o id - A string identifier for the feature. o ref - A reference to another sequence. This could be an accession number for some different sequence. Note this is a shortcut for the reference property of the feature's location. o ref_db - A different database for the reference accession number. Note this is a shortcut for the reference property of the location o qualifiers - A dictionary of qualifiers on the feature. These are analogous to the qualifiers from a GenBank feature table. The keys of the dictionary are qualifier names, the values are the qualifier values. o sub_features - Additional SeqFeatures which fall under this 'parent' feature. For instance, if we having something like: CDS join(1..10,30..40,50..60) Then the top level feature would be of type 'CDS' from 1 to 60 (actually 0 to 60 in Python counting) with location_operator='join', and the three sub- features would also be of type 'CDS', and would be from 1 to 10, 30 to 40 and 50 to 60, respectively (although actually using Python counting). To get the nucleotide sequence for this CDS, you would need to take the parent sequence and do seq[0:10]+seq[29:40]+seq[49:60] (Python counting). Things are more complicated with strands and fuzzy positions. To save you dealing with all these special cases, the SeqFeature provides an extract method to do this for you.
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
Inherited from |
|||
|
|||
|
strand Feature's strand |
|||
|
ref Feature location reference (e.g. |
|||
|
ref_db Feature location reference's database. |
|||
|
Inherited from |
|||
|
|||
Initialize a SeqFeature on a Sequence. location can either be a FeatureLocation (with strand argument also given if required), or None. e.g. With no strand, on the forward strand, and on the reverse strand: >>> from Bio.SeqFeature import SeqFeature, FeatureLocation >>> f1 = SeqFeature(FeatureLocation(5, 10), type="domain") >>> f1.strand == f1.location.strand == None True >>> f2 = SeqFeature(FeatureLocation(7, 110, strand=1), type="CDS") >>> f2.strand == f2.location.strand == +1 True >>> f3 = SeqFeature(FeatureLocation(9, 108, strand=-1), type="CDS") >>> f3.strand == f3.location.strand == -1 True An invalid strand will trigger an exception: >>> f4 = SeqFeature(FeatureLocation(50, 60), strand=2) Traceback (most recent call last): ... ValueError: Strand should be +1, -1, 0 or None, not 2 Similarly if set via the FeatureLocation directly: >>> loc4 = FeatureLocation(50, 60, strand=2) Traceback (most recent call last): ... ValueError: Strand should be +1, -1, 0 or None, not 2 For exact start/end positions, an integer can be used (as shown above) as shorthand for the ExactPosition object. For non-exact locations, the FeatureLocation must be specified via the appropriate position objects.
|
A string representation of the record for debugging.
|
A readable summary of the feature intended to be printed to screen.
|
Returns a copy of the feature with its location shifted (PRIVATE). The annotation qaulifiers are copied. |
Returns a copy of the feature with its location flipped (PRIVATE). The argument length gives the length of the parent sequence. For example a location 0..20 (+1 strand) with parent length 30 becomes after flipping 10..30 (-1 strand). Strandless (None) or unknown strand (0) remain like that - just their end points are changed. The annotation qaulifiers are copied. |
Extract feature sequence from the supplied parent sequence.
The parent_sequence can be a Seq like object or a string, and will
generally return an object of the same type. The exception to this is
a MutableSeq as the parent sequence will return a Seq object.
This should cope with complex locations including complements, joins
and fuzzy positions. Even mixed strand features should work! This
also covers features on protein sequences (e.g. domains), although
here reverse strand features are not permitted.
>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import generic_protein
>>> from Bio.SeqFeature import SeqFeature, FeatureLocation
>>> seq = Seq("MKQHKAMIVALIVICITAVVAAL", generic_protein)
>>> f = SeqFeature(FeatureLocation(8,15), type="domain")
>>> f.extract(seq)
Seq('VALIVIC', ProteinAlphabet())
Note - currently only sub-features of type "join" are supported.
|
Returns True regardless of the length of the feature. This behaviour is for backwards compatibility, since until the __len__ method was added, a SeqFeature always evaluated as True. Note that in comparison, Seq objects, strings, lists, etc, will all evaluate to False if they have length zero. WARNING: The SeqFeature may in future evaluate to False when its length is zero (in order to better match normal python behaviour)! |
Returns the length of the region described by a feature.
>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import generic_protein
>>> from Bio.SeqFeature import SeqFeature, FeatureLocation
>>> seq = Seq("MKQHKAMIVALIVICITAVVAAL", generic_protein)
>>> f = SeqFeature(FeatureLocation(8,15), type="domain")
>>> len(f)
7
>>> f.extract(seq)
Seq('VALIVIC', ProteinAlphabet())
>>> len(f.extract(seq))
7
For simple features without subfeatures this is the same as the region
spanned (end position minus start position). However, for a feature
defined by combining several subfeatures (e.g. a CDS as the join of
several exons) the gaps are not counted (e.g. introns). This ensures
that len(f) == len(f.extract(parent_seq)), and also makes sure things
work properly with features wrapping the origin etc.
|
Iterate over the parent positions within the feature. The iteration order is strand aware, and can be thought of as moving along the feature using the parent sequence coordinates: >>> from Bio.SeqFeature import SeqFeature, FeatureLocation >>> f = SeqFeature(FeatureLocation(5,10), type="domain", strand=-1) >>> len(f) 5 >>> for i in f: print i 9 8 7 6 5 >>> list(f) [9, 8, 7, 6, 5] |
Check if an integer position is within the feature.
>>> from Bio.SeqFeature import SeqFeature, FeatureLocation
>>> f = SeqFeature(FeatureLocation(5,10), type="domain", strand=-1)
>>> len(f)
5
>>> [i for i in range(15) if i in f]
[5, 6, 7, 8, 9]
For example, to see which features include a SNP position, you could
use this:
>>> from Bio import SeqIO
>>> record = SeqIO.read("GenBank/NC_000932.gb", "gb")
>>> for f in record.features:
... if 1750 in f:
... print f.type, f.location
source [0:154478](+)
gene [1716:4347](-)
tRNA [1716:4347](-)
Note that for a feature defined as a join of several subfeatures (e.g.
the union of several exons) the gaps are not checked (e.g. introns).
In this example, the tRNA location is defined in the GenBank file as
complement(join(1717..1751,4311..4347)), so that position 1760 falls
in the gap:
>>> for f in record.features:
... if 1760 in f:
... print f.type, f.location
source [0:154478](+)
gene [1716:4347](-)
Note that additional care may be required with fuzzy locations, for
example just before a BeforePosition:
>>> from Bio.SeqFeature import SeqFeature, FeatureLocation
>>> from Bio.SeqFeature import BeforePosition
>>> f = SeqFeature(FeatureLocation(BeforePosition(3),8), type="domain")
>>> len(f)
5
>>> [i for i in range(10) if i in f]
[3, 4, 5, 6, 7]
|
|
|||
strandFeature's strand This is a shortcut for feature.location.strand
|
refFeature location reference (e.g. accession). This is a shortcut for feature.location.ref |
ref_dbFeature location reference's database. This is a shortcut for feature.location.ref_db
|
| Trees | Indices | Help |
|
|---|
| Generated by Epydoc 3.0.1 on Tue Feb 5 18:02:52 2013 | http://epydoc.sourceforge.net |