Package Bio :: Package GFF :: Module easy
[hide private]
[frames] | no frames]

Module easy

source code

Bio.GFF.easy: some functions to ease the use of Biopython (DEPRECATED)

This is part of the "old" Bio.GFF module by Michael Hoffman, which offered
access to a MySQL database holding GFF data loaded by BioPerl. This code has
now been deprecated, and will probably be removed in order to free the Bio.GFF
namespace for a new GFF parser in Biopython (including GFF3 support).

Some of the more useful ideas of Bio.GFF.easy may be reworked for Bio.GenBank,
using the standard SeqFeature objects used elsewhere in Biopython.

Classes [hide private]
  FeatureDict
JH: accessing feature.qualifiers as a list is stupid.
  Location
this is really best interfaced through LocationFromString fuzzy: < or > join: {0 = no join, 1 = join, 2 = order}
  LocationJoin
>>> join = LocationJoin([LocationFromCoords(339, 564, 1), LocationFromString("complement(100..339)")])...
  LocationFromCoords
>>> print LocationFromCoords(339, 564)...
  LocationFromString
>>> # here are some tests from http://www.ncbi.nlm.nih.gov/collab/FT/index.html#location >>> print LocationFromString("467") 467 >>> print LocationFromString("340..565") 340..565 >>> print LocationFromString("<345..500") <345..500 >>> print LocationFromString("<1..888") <1..888 >>> # (102.110) and 123^124 syntax unimplemented >>> print LocationFromString("join(12..78,134..202)") join(12..78,134..202) >>> print LocationFromString("complement(join(2691..4571,4918..5163))") complement(join(2691..4571,4918..5163)) >>> print LocationFromString("join(complement(4918..5163),complement(2691..4571))") join(complement(4918..5163),complement(2691..4571)) >>> print LocationFromString("order(complement(4918..5163),complement(2691..4571))") order(complement(4918..5163),complement(2691..4571)) >>> print LocationFromString("NC_001802x.fna:73..78") NC_001802x.fna:73..78 >>> print LocationFromString("J00194:100..202") J00194:100..202
Functions [hide private]
 
_hashname(location) source code
 
open_file(filename) source code
 
fasta_single(filename=None, string=None)
>>> record = fasta_single(string=''' ...
source code
 
fasta_multi(filename=None) source code
 
fasta_readrecords(filename=None)
>>> records = fasta_readrecords('GFF/multi.fna')...
source code
 
fasta_write(filename, records) source code
 
genbank_single(filename)
>>> record = genbank_single("GFF/NC_001422.gbk")...
source code
 
record_subseq(record, location, *args, **keywds)
>>> from Bio.SeqRecord import SeqRecord >>> record = SeqRecord(Seq("gagttttatcgcttccatga"), ...
source code
 
record_sequence(record)
returns the sequence of a record
source code
 
record_coords(record, start, end, strand=0, upper=0)
>>> from Bio.SeqRecord import SeqRecord >>> record = SeqRecord(Seq("gagttttatcgcttccatga"), ...
source code
 
_test()
Run the Bio.GFF.easy module's doctests (PRIVATE).
source code
Variables [hide private]
  re_complement = re.compile(r'^complement\((.*)\)$')
  re_seqname = re.compile(r'^(?!join|order|complement)([^:]+?):(...
  re_join = re.compile(r'^(join|order)\((.*)\)$')
  re_dotdot = re.compile(r'^([><]*\d+)\.\.([><]*\d+)$')
  re_fuzzy = re.compile(r'^([><])(\d+)')
Function Details [hide private]

fasta_single(filename=None, string=None)

source code 

>>> record = fasta_single(string='''
... >gi|9629360|ref|NP_057850.1| Gag [Human immunodeficiency virus type 1]
... MGARASVLSGGELDRWEKIRLRPGGKKKYKLKHIVWASRELERFAVNPGLLETSEGCRQILGQLQPSLQT
... GSEELRSLYNTVATLYCVHQRIEIKDTKEALDKIEEEQNKSKKKAQQAAADTGHSNQVSQNYPIVQNIQG
... QMVHQAISPRTLNAWVKVVEEKAFSPEVIPMFSALSEGATPQDLNTMLNTVGGHQAAMQMLKETINEEAA
... EWDRVHPVHAGPIAPGQMREPRGSDIAGTTSTLQEQIGWMTNNPPIPVGEIYKRWIILGLNKIVRMYSPT
... SILDIRQGPKEPFRDYVDRFYKTLRAEQASQEVKNWMTETLLVQNANPDCKTILKALGPAATLEEMMTAC
... QGVGGPGHKARVLAEAMSQVTNSATIMMQRGNFRNQRKIVKCFNCGKEGHTARNCRAPRKKGCWKCGKEG
... HQMKDCTERQANFLGKIWPSYKGRPGNFLQSRPEPTAPPEESFRSGVETTTPPQKQEPIDKELYPLTSLR
... SLFGNDPSSQ
... ''')
>>> record.id
'gi|9629360|ref|NP_057850.1|'
>>> record.description
'gi|9629360|ref|NP_057850.1| Gag [Human immunodeficiency virus type 1]'
>>> record.seq[0:5]
Seq('MGARA', SingleLetterAlphabet())

fasta_readrecords(filename=None)

source code 

>>> records = fasta_readrecords('GFF/multi.fna')
>>> records[0].id
'test1'
>>> records[2].seq
Seq('AAACACAC', SingleLetterAlphabet())

genbank_single(filename)

source code 

>>> record = genbank_single("GFF/NC_001422.gbk")
>>> record.taxonomy
['Viruses', 'ssDNA viruses', 'Microviridae', 'Microvirus']
>>> cds = record.features[-4]
>>> cds.key
'CDS'
>>> location = LocationFromString(cds.location)
>>> print location
2931..3917
>>> subseq = record_subseq(record, location)
>>> subseq[0:20]
Seq('ATGTTTGGTGCTATTGCTGG', Alphabet())

record_subseq(record, location, *args, **keywds)

source code 

>>> from Bio.SeqRecord import SeqRecord    
>>> record = SeqRecord(Seq("gagttttatcgcttccatga"),
...                    "ref|NC_001422",
...                    "Coliphage phiX174, complete genome",
...                    "bases 1-11")
>>> record_subseq(record, LocationFromString("1..4")) # one-based
Seq('GAGT', Alphabet())
>>> record_subseq(record, LocationFromString("complement(1..4)")) # one-based
Seq('ACTC', Alphabet())
>>> record_subseq(record, LocationFromString("join(complement(1..4),1..4)")) # what an idea!
Seq('ACTCGAGT', Alphabet())
>>> loc = LocationFromString("complement(join(complement(5..7),1..4))")
>>> print loc
complement(join(complement(5..7),1..4))
>>> record_subseq(record, loc)
Seq('ACTCTTT', Alphabet())
>>> print loc
complement(join(complement(5..7),1..4))
>>> loc.reverse()
>>> record_subseq(record, loc)
Seq('AAAGAGT', Alphabet())
>>> record_subseq(record, loc, upper=1)
Seq('AAAGAGT', Alphabet())

record_sequence(record)

source code 

returns the sequence of a record

can be Bio.SeqRecord.SeqRecord or Bio.GenBank.Record.Record

record_coords(record, start, end, strand=0, upper=0)

source code 

>>> from Bio.SeqRecord import SeqRecord
>>> record = SeqRecord(Seq("gagttttatcgcttccatga"),
...                    "ref|NC_001422",
...                    "Coliphage phiX174, complete genome",
...                    "bases 1-11")
>>> record_coords(record, 0, 4) # zero-based
Seq('GAGT', Alphabet())
>>> record_coords(record, 0, 4, "-") # zero-based
Seq('ACTC', Alphabet())
>>> record_coords(record, 0, 4, "-", upper=1) # zero-based
Seq('ACTC', Alphabet())

_test()

source code 
Run the Bio.GFF.easy module's doctests (PRIVATE).

This will try and locate the unit tests directory, and run the doctests
from there in order that the relative paths used in the examples work.


Variables Details [hide private]

re_seqname

Value:
re.compile(r'^(?!join|order|complement)([^:]+?):(.*)$')