Package Bio :: Module SeqFeature :: Class SeqFeature
[hide private]
[frames] | no frames]

Class SeqFeature

source code

object --+
         |
        SeqFeature

Represent a Sequence Feature on an object.

Attributes:
o location - the location of the feature on the sequence (FeatureLocation)
o type - the specified type of the feature (ie. CDS, exon, repeat...)
o location_operator - a string specifying how this SeqFeature may
be related to others. For example, in the example GenBank feature
shown below, the location_operator would be "join". This is a proxy
for feature.location.operator and only applies to compound locations.
o strand - A value specifying on which strand (of a DNA sequence, for
instance) the feature deals with. 1 indicates the plus strand, -1
indicates the minus strand, 0 indicates stranded but unknown (? in GFF3),
while the default of None indicates that strand doesn't apply (dot in GFF3,
e.g. features on proteins). Note this is a shortcut for accessing the
strand property of the feature's location.
o id - A string identifier for the feature.
o ref - A reference to another sequence. This could be an accession
number for some different sequence. Note this is a shortcut for the
reference property of the feature's location.
o ref_db - A different database for the reference accession number.
Note this is a shortcut for the reference property of the location
o qualifiers - A dictionary of qualifiers on the feature. These are
analogous to the qualifiers from a GenBank feature table. The keys of
the dictionary are qualifier names, the values are the qualifier
values.
o sub_features - Obsolete list of additional SeqFeatures which was
used for holding compound locations (e.g. joins in GenBank/EMBL).
This is now superceded by a CompoundFeatureLocation as the location,
and should not be used (DEPRECATED).

Instance Methods [hide private]
 
__init__(self, location=None, type='', location_operator='', strand=None, id='<unknown id>', qualifiers=None, sub_features=None, ref=None, ref_db=None)
Initialize a SeqFeature on a Sequence.
source code
 
_get_sub_features(self) source code
 
_set_sub_features(self, value) source code
 
_get_strand(self) source code
 
_set_strand(self, value) source code
 
_get_ref(self) source code
 
_set_ref(self, value) source code
 
_get_ref_db(self) source code
 
_set_ref_db(self, value) source code
 
_get_location_operator(self) source code
 
_set_location_operator(self, value) source code
 
__repr__(self)
A string representation of the record for debugging.
source code
 
__str__(self)
A readable summary of the feature intended to be printed to screen.
source code
 
_shift(self, offset)
Returns a copy of the feature with its location shifted (PRIVATE).
source code
 
_flip(self, length)
Returns a copy of the feature with its location flipped (PRIVATE).
source code
 
extract(self, parent_sequence)
Extract feature sequence from the supplied parent sequence.
source code
 
__bool__(self)
Boolean value of an instance of this class (True).
source code
 
__nonzero__(self)
Boolean value of an instance of this class (True).
source code
 
__len__(self)
Returns the length of the region described by a feature.
source code
 
__iter__(self)
Iterate over the parent positions within the feature.
source code
 
__contains__(self, value)
Check if an integer position is within the feature.
source code

Inherited from object: __delattr__, __format__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __setattr__, __sizeof__, __subclasshook__

Properties [hide private]
  sub_features
Obsolete representation of compound locations (DEPRECATED).
  strand
Feature's strand
  ref
Feature location reference (e.g.
  ref_db
Feature location reference's database.
  location_operator
Location operator for compound locations (e.g.

Inherited from object: __class__

Method Details [hide private]

__init__(self, location=None, type='', location_operator='', strand=None, id='<unknown id>', qualifiers=None, sub_features=None, ref=None, ref_db=None)
(Constructor)

source code 
Initialize a SeqFeature on a Sequence.

location can either be a FeatureLocation (with strand argument also
given if required), or None.

e.g. With no strand, on the forward strand, and on the reverse strand:

>>> from Bio.SeqFeature import SeqFeature, FeatureLocation
>>> f1 = SeqFeature(FeatureLocation(5, 10), type="domain")
>>> f1.strand == f1.location.strand == None
True
>>> f2 = SeqFeature(FeatureLocation(7, 110, strand=1), type="CDS")
>>> f2.strand == f2.location.strand == +1
True
>>> f3 = SeqFeature(FeatureLocation(9, 108, strand=-1), type="CDS")
>>> f3.strand == f3.location.strand == -1
True

An invalid strand will trigger an exception:

>>> f4 = SeqFeature(FeatureLocation(50, 60), strand=2)
Traceback (most recent call last):
   ...
ValueError: Strand should be +1, -1, 0 or None, not 2

Similarly if set via the FeatureLocation directly:

>>> loc4 = FeatureLocation(50, 60, strand=2)
Traceback (most recent call last):
   ...
ValueError: Strand should be +1, -1, 0 or None, not 2

For exact start/end positions, an integer can be used (as shown above)
as shorthand for the ExactPosition object. For non-exact locations, the
FeatureLocation must be specified via the appropriate position objects.

Note that the strand, ref and ref_db arguments to the SeqFeature are
now obsolete and will be deprecated in a future release (which will
give warning messages) and later removed. Set them via the location
object instead.

Note that location_operator and sub_features arguments can no longer
be used, instead do this via the CompoundLocation object.

Overrides: object.__init__

__repr__(self)
(Representation operator)

source code 
A string representation of the record for debugging.

Overrides: object.__repr__

__str__(self)
(Informal representation operator)

source code 
A readable summary of the feature intended to be printed to screen.
        

Overrides: object.__str__

_shift(self, offset)

source code 
Returns a copy of the feature with its location shifted (PRIVATE).

The annotation qaulifiers are copied.

_flip(self, length)

source code 
Returns a copy of the feature with its location flipped (PRIVATE).

The argument length gives the length of the parent sequence. For
example a location 0..20 (+1 strand) with parent length 30 becomes
after flipping 10..30 (-1 strand). Strandless (None) or unknown
strand (0) remain like that - just their end points are changed.

The annotation qaulifiers are copied.

extract(self, parent_sequence)

source code 
Extract feature sequence from the supplied parent sequence.

The parent_sequence can be a Seq like object or a string, and will
generally return an object of the same type. The exception to this is
a MutableSeq as the parent sequence will return a Seq object.

This should cope with complex locations including complements, joins
and fuzzy positions. Even mixed strand features should work! This
also covers features on protein sequences (e.g. domains), although
here reverse strand features are not permitted.

>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import generic_protein
>>> from Bio.SeqFeature import SeqFeature, FeatureLocation
>>> seq = Seq("MKQHKAMIVALIVICITAVVAAL", generic_protein)
>>> f = SeqFeature(FeatureLocation(8, 15), type="domain")
>>> f.extract(seq)
Seq('VALIVIC', ProteinAlphabet())

Note - currently only sub-features of type "join" are supported.

__bool__(self)

source code 
Boolean value of an instance of this class (True).

This behaviour is for backwards compatibility, since until the
__len__ method was added, a SeqFeature always evaluated as True.

Note that in comparison, Seq objects, strings, lists, etc, will all
evaluate to False if they have length zero.

WARNING: The SeqFeature may in future evaluate to False when its
length is zero (in order to better match normal python behaviour)!

__nonzero__(self)
(Boolean test operator)

source code 
Boolean value of an instance of this class (True).

This behaviour is for backwards compatibility, since until the
__len__ method was added, a SeqFeature always evaluated as True.

Note that in comparison, Seq objects, strings, lists, etc, will all
evaluate to False if they have length zero.

WARNING: The SeqFeature may in future evaluate to False when its
length is zero (in order to better match normal python behaviour)!

__len__(self)
(Length operator)

source code 
Returns the length of the region described by a feature.

>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import generic_protein
>>> from Bio.SeqFeature import SeqFeature, FeatureLocation
>>> seq = Seq("MKQHKAMIVALIVICITAVVAAL", generic_protein)
>>> f = SeqFeature(FeatureLocation(8, 15), type="domain")
>>> len(f)
7
>>> f.extract(seq)
Seq('VALIVIC', ProteinAlphabet())
>>> len(f.extract(seq))
7

This is a proxy for taking the length of the feature's location:

>>> len(f.location)
7

For simple features this is the same as the region spanned (end
position minus start position using Pythonic counting). However, for
a compound location (e.g. a CDS as the join of several exons) the
gaps are not counted (e.g. introns). This ensures that len(f) matches
len(f.extract(parent_seq)), and also makes sure things work properly
with features wrapping the origin etc.

__iter__(self)

source code 
Iterate over the parent positions within the feature.

The iteration order is strand aware, and can be thought of as moving
along the feature using the parent sequence coordinates:

>>> from Bio.SeqFeature import SeqFeature, FeatureLocation
>>> f = SeqFeature(FeatureLocation(5, 10), type="domain", strand=-1)
>>> len(f)
5
>>> for i in f: print(i)
9
8
7
6
5
>>> list(f)
[9, 8, 7, 6, 5]

This is a proxy for iterating over the location,

>>> list(f.location)
[9, 8, 7, 6, 5]

__contains__(self, value)
(In operator)

source code 
Check if an integer position is within the feature.

>>> from Bio.SeqFeature import SeqFeature, FeatureLocation
>>> f = SeqFeature(FeatureLocation(5, 10), type="domain", strand=-1)
>>> len(f)
5
>>> [i for i in range(15) if i in f]
[5, 6, 7, 8, 9]

For example, to see which features include a SNP position, you could
use this:

>>> from Bio import SeqIO
>>> record = SeqIO.read("GenBank/NC_000932.gb", "gb")
>>> for f in record.features:
...     if 1750 in f:
...         print("%s %s" % (f.type, f.location))
source [0:154478](+)
gene [1716:4347](-)
tRNA join{[4310:4347](-), [1716:1751](-)}

Note that for a feature defined as a join of several subfeatures (e.g.
the union of several exons) the gaps are not checked (e.g. introns).
In this example, the tRNA location is defined in the GenBank file as
complement(join(1717..1751,4311..4347)), so that position 1760 falls
in the gap:

>>> for f in record.features:
...     if 1760 in f:
...         print("%s %s" % (f.type, f.location))
source [0:154478](+)
gene [1716:4347](-)

Note that additional care may be required with fuzzy locations, for
example just before a BeforePosition:

>>> from Bio.SeqFeature import SeqFeature, FeatureLocation
>>> from Bio.SeqFeature import BeforePosition
>>> f = SeqFeature(FeatureLocation(BeforePosition(3), 8), type="domain")
>>> len(f)
5
>>> [i for i in range(10) if i in f]
[3, 4, 5, 6, 7]

Note that is is a proxy for testing membership on the location.

>>> [i for i in range(10) if i in f.location]
[3, 4, 5, 6, 7]


Property Details [hide private]

sub_features

Obsolete representation of compound locations (DEPRECATED).

Get Method:
_get_sub_features(self)
Set Method:
_set_sub_features(self, value)

strand

Feature's strand

This is a shortcut for feature.location.strand

Get Method:
_get_strand(self)
Set Method:
_set_strand(self, value)

ref

Feature location reference (e.g. accession).

This is a shortcut for feature.location.ref

Get Method:
_get_ref(self)
Set Method:
_set_ref(self, value)

ref_db

Feature location reference's database.

This is a shortcut for feature.location.ref_db

Get Method:
_get_ref_db(self)
Set Method:
_set_ref_db(self, value)

location_operator

Location operator for compound locations (e.g. join).

Get Method:
_get_location_operator(self)
Set Method:
_set_location_operator(self, value)