Talk:SeqIO

(Difference between revisions)
Jump to: navigation, search
(Question about where to place an example.)
(SeqIO.to_dict() - how to specify SeqRecord Alphabet)
Line 42: Line 42:
 
  Desription: cneo_JEC21:cn-jec21_chr11
 
  Desription: cneo_JEC21:cn-jec21_chr11
 
  Seq('CCCCCTAACCCCCTAACCCCCTAACCCCCTAACCCCCTAACCCCTAACCCCCTAACCCTA ...', SingleLetterAlphabet())
 
  Seq('CCCCCTAACCCCCTAACCCCCTAACCCCCTAACCCCCTAACCCCTAACCCCCTAACCCTA ...', SingleLetterAlphabet())
 +
 +
Peter: The to_dict() method doesn't do anything with alphabets - your question is really about the parse() function.  I've been thinking about extending parse() to take an optional alphabet to cover this exact situation, but for now it doesn't.  Sorry.  If you are using FASTA files, then the best I can suggest is something like this:
 +
 +
Bio.Alphabet import generic_dna
 +
records = SeqIO.parse(handle, "fasta")
 +
for record in records :
 +
    record.seq.alphabet = generic_dna
 +
record_dict = SeqIO.to_dict(SeqIO.parse(handle, "fasta"))
 +
 +
Or, perhaps:
 +
 +
Bio.Alphabet import generic_dna
 +
def force_dna(record) :
 +
    record.seq.alphabet = generic_dna
 +
    return record
 +
record_dict = SeqIO.to_dict(map(force_dna, SeqIO.parse(handle, "fasta")))
 +
 +
Or, if you like generators and are using Python 2.4 or later:
 +
 +
Bio.Alphabet import generic_dna
 +
def force_dna(record) :
 +
    record.seq.alphabet = generic_dna
 +
    return record
 +
record_dict = SeqIO.to_dict(force_dna(rec) for rec in SeqIO.parse(handle, "fasta")))

Revision as of 13:15, 29 August 2007

I added a example into the output section, but now I wonder if this is the best place to put it. Maybe it should be under "examples". --Sbassi 01:20, 27 August 2007 (EDT)


Hi, I am a newbie of Biopython. I tested Bio.SeqIO.parse on EMBL formated miRNA.dat from the microRNA Registry. But I got an error. Is the EMBL format supported fully? As:

>>>from Bio import SeqIO
>>>handle = open('/home/liang/Desktop/miRNA.dat','rU')
>>>record_iterator = SeqIO.parse(handle,'embl')
>>>first_record=record_iterator.next()
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "/usr/lib/python2.5/site-packages/Bio/GenBank/Scanner.py", line 410, in parse_records
   record = self.parse(handle)
 File "/usr/lib/python2.5/site-packages/Bio/GenBank/Scanner.py", line 393, in parse
   if self.feed(handle, consumer) :
 File "/usr/lib/python2.5/site-packages/Bio/GenBank/Scanner.py", line 360, in feed
   self._feed_first_line(consumer, self.line)
 File "/usr/lib/python2.5/site-packages/Bio/GenBank/Scanner.py", line 540, in _feed_first_line
   assert len(fields) == 7
AssertionError
  • Questions like this will probably be answered faster if you post them to the BioPython mail list. --Cjfields 16:03, 23 May 2007 (EDT)

SeqIO.to_dict() - how to specify SeqRecord Alphabet

Hi, nice work with this module!

The to_dict() method creates a dictionary with biopython SeqRecord objects.

record_dict = SeqIO.to_dict(SeqIO.parse(handle, "fasta"))

How can I specify which Alphabet do they must have? By default, they have SingleLetterAlphabet.

Chromosome cneo_JEC21:cn-jec21_chr11:
ID: cneo_JEC21:cn-jec21_chr11
Name: cneo_JEC21:cn-jec21_chr11
Desription: cneo_JEC21:cn-jec21_chr11
Seq('CCCCCTAACCCCCTAACCCCCTAACCCCCTAACCCCCTAACCCCTAACCCCCTAACCCTA ...', SingleLetterAlphabet())

Peter: The to_dict() method doesn't do anything with alphabets - your question is really about the parse() function. I've been thinking about extending parse() to take an optional alphabet to cover this exact situation, but for now it doesn't. Sorry. If you are using FASTA files, then the best I can suggest is something like this:

Bio.Alphabet import generic_dna records = SeqIO.parse(handle, "fasta") for record in records :

   record.seq.alphabet = generic_dna

record_dict = SeqIO.to_dict(SeqIO.parse(handle, "fasta"))

Or, perhaps:

Bio.Alphabet import generic_dna def force_dna(record) :

   record.seq.alphabet = generic_dna
   return record

record_dict = SeqIO.to_dict(map(force_dna, SeqIO.parse(handle, "fasta")))

Or, if you like generators and are using Python 2.4 or later:

Bio.Alphabet import generic_dna def force_dna(record) :

   record.seq.alphabet = generic_dna
   return record

record_dict = SeqIO.to_dict(force_dna(rec) for rec in SeqIO.parse(handle, "fasta")))

Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox