Package Bio :: Package SeqIO :: Module FastaIO
[hide private]
[frames] | no frames]

Module FastaIO

source code

Bio.SeqIO support for the "fasta" (aka FastA or Pearson) file format.

You are expected to use this module via the Bio.SeqIO functions.

Classes [hide private]
  FastaWriter
Class to write Fasta format files.
Functions [hide private]
 
SimpleFastaParser(handle)
Generator function to iterate over Fasta records (as string tuples).
source code
 
FastaIterator(handle, alphabet=SingleLetterAlphabet(), title2ids=None)
Generator function to iterate over Fasta records (as SeqRecord objects).
source code
Variables [hide private]
  __package__ = 'Bio.SeqIO'
Function Details [hide private]

SimpleFastaParser(handle)

source code 

Generator function to iterate over Fasta records (as string tuples).

For each record a tuple of two strings is returned, the FASTA title line (without the leading '>' character), and the sequence (with any whitespace removed). The title line is not divided up into an identifier (the first word) and comment or description.

>>> with open("Fasta/dups.fasta") as handle:
...     for values in SimpleFastaParser(handle):
...         print(values)
...
('alpha', 'ACGTA')
('beta', 'CGTC')
('gamma', 'CCGCC')
('alpha (again - this is a duplicate entry to test the indexing code)', 'ACGTA')
('delta', 'CGCGC')

FastaIterator(handle, alphabet=SingleLetterAlphabet(), title2ids=None)

source code 

Generator function to iterate over Fasta records (as SeqRecord objects).

Arguments:

  • handle - input file
  • alphabet - optional alphabet
  • title2ids - A function that, when given the title of the FASTA file (without the beginning >), will return the id, name and description (in that order) for the record as a tuple of strings. If this is not given, then the entire title line will be used as the description, and the first word as the id and name.

By default this will act like calling Bio.SeqIO.parse(handle, "fasta") with no custom handling of the title lines:

>>> with open("Fasta/dups.fasta") as handle:
...     for record in FastaIterator(handle):
...         print(record.id)
...
alpha
beta
gamma
alpha
delta

However, you can supply a title2ids function to alter this:

>>> def take_upper(title):
...     return title.split(None, 1)[0].upper(), "", title
>>> with open("Fasta/dups.fasta") as handle:
...     for record in FastaIterator(handle, title2ids=take_upper):
...         print(record.id)
...
ALPHA
BETA
GAMMA
ALPHA
DELTA