Talk:SeqIO

From Biopython
(Difference between revisions)
Jump to: navigation, search
(Added a notice and some replies.)
(How to specify SeqRecord Alphabet: Bug 2443 is fixed now)
 
(7 intermediate revisions by one user not shown)
Line 2: Line 2:
  
 
* If you want help on using Bio.SeqIO, please join the discussion mailing list (see [[mailing lists]]).
 
* If you want help on using Bio.SeqIO, please join the discussion mailing list (see [[mailing lists]]).
* If you think you've found a bug, please report it on bugzilla (see [[bugs]]).
+
* If you think you've found a bug, please report it on [http://bugzilla.open-bio.org/ bugzilla].
 
* If you have a nice example you'd like to add, I would encourage you to join the discussion mailing list, but posting the code here should be fine.
 
* If you have a nice example you'd like to add, I would encourage you to join the discussion mailing list, but posting the code here should be fine.
  
Line 13: Line 13:
 
I added a example into the output section, but now I wonder if this is the best place to put it. Maybe it should be under "examples". --[[User:Sbassi|Sbassi]] 01:20, 27 August 2007 (EDT)
 
I added a example into the output section, but now I wonder if this is the best place to put it. Maybe it should be under "examples". --[[User:Sbassi|Sbassi]] 01:20, 27 August 2007 (EDT)
  
[[User:Maubp|Peter]] Nice idea - and yes, I would have put it under examples too.  I've moved it and edited it too - I wanted the style to match the rest of the page, and also I think there was a possible problem in your randomising code where a sequence might run over the end of the record. I opted to simplify the example - I hope your still happy with it.
+
[[User:Maubp|Peter]] Nice idea - and yes, I would have put it under examples too.  I've moved it and edited it too - I wanted the style to match the rest of the page, and also I think there was a possible problem in your randomising code where a sequence might run over the end of the record. I opted to simplify the example - I hope you're still happy with it.
  
 
== EMBL Problem? ==
 
== EMBL Problem? ==
Line 21: Line 21:
 
As:
 
As:
  
>>>from Bio import SeqIO
+
<python>
>>>handle = open('/home/liang/Desktop/miRNA.dat','rU')
+
>>>from Bio import SeqIO
>>>record_iterator = SeqIO.parse(handle,'embl')
+
>>>handle = open('/home/liang/Desktop/miRNA.dat','rU')
>>>first_record=record_iterator.next()
+
>>>record_iterator = SeqIO.parse(handle,'embl')
Traceback (most recent call last):
+
>>>first_record=record_iterator.next()
  File "<stdin>", line 1, in <module>
+
Traceback (most recent call last):
  File "/usr/lib/python2.5/site-packages/Bio/GenBank/Scanner.py", line 410, in parse_records
+
File "<stdin>", line 1, in <module>
    record = self.parse(handle)
+
File "/usr/lib/python2.5/site-packages/Bio/GenBank/Scanner.py", line 410, in parse_records
  File "/usr/lib/python2.5/site-packages/Bio/GenBank/Scanner.py", line 393, in parse
+
  record = self.parse(handle)
    if self.feed(handle, consumer) :
+
File "/usr/lib/python2.5/site-packages/Bio/GenBank/Scanner.py", line 393, in parse
  File "/usr/lib/python2.5/site-packages/Bio/GenBank/Scanner.py", line 360, in feed
+
  if self.feed(handle, consumer) :
    self._feed_first_line(consumer, self.line)
+
File "/usr/lib/python2.5/site-packages/Bio/GenBank/Scanner.py", line 360, in feed
  File "/usr/lib/python2.5/site-packages/Bio/GenBank/Scanner.py", line 540, in _feed_first_line
+
  self._feed_first_line(consumer, self.line)
    assert len(fields) == 7
+
File "/usr/lib/python2.5/site-packages/Bio/GenBank/Scanner.py", line 540, in _feed_first_line
AssertionError
+
  assert len(fields) == 7
 +
AssertionError</python>
  
 
* Questions like this will probably be answered faster if you post them to the BioPython [[Mailing lists|mail list]]. --[[User:Cjfields|Cjfields]] 16:03, 23 May 2007 (EDT)
 
* Questions like this will probably be answered faster if you post them to the BioPython [[Mailing lists|mail list]]. --[[User:Cjfields|Cjfields]] 16:03, 23 May 2007 (EDT)
  
* [[User:Maubp|Peter]] as Chris suggested, your question would have been noticed much earlier on the mailing list - but filing a bug would have been an better idea (see [[bugs]]).  Could you do that with a bit more information on where the example file came from?  Thanks.
+
* [[User:Maubp|Peter]] as Chris suggested, your question would have been noticed much earlier on the mailing list - but filing a bug would have been an better idea (see [http://bugzilla.open-bio.org/ bugzilla]).  Could you do that with a bit more information on where the example file came from?  Thanks.
  
 
== How to specify SeqRecord Alphabet ==
 
== How to specify SeqRecord Alphabet ==
Line 48: Line 49:
 
The to_dict() method creates a dictionary with biopython SeqRecord objects.
 
The to_dict() method creates a dictionary with biopython SeqRecord objects.
  
record_dict = SeqIO.to_dict(SeqIO.parse(handle, "fasta"))
+
<python>record_dict = SeqIO.to_dict(SeqIO.parse(handle, "fasta"))</python>
  
 
How can I specify which Alphabet do they must have?
 
How can I specify which Alphabet do they must have?
Line 59: Line 60:
 
  Seq('CCCCCTAACCCCCTAACCCCCTAACCCCCTAACCCCCTAACCCCTAACCCCCTAACCCTA ...', SingleLetterAlphabet())
 
  Seq('CCCCCTAACCCCCTAACCCCCTAACCCCCTAACCCCCTAACCCCTAACCCCCTAACCCTA ...', SingleLetterAlphabet())
  
[[User:Maubp|Peter]]: The to_dict() method doesn't do anything with alphabets - your question is really about the parse() function.  I've been thinking about extending parse() to take an optional alphabet to cover this exact situation where the file format doesn't specify the alphabet, but for now it doesn't. Sorry. If you are using FASTA files, and you need to specify the alphabet, then the best I can suggest is something like this:
+
[[User:Peter]]: The to_dict() method doesn't do anything with alphabets - your question is really about the parse() function.  For Biopython 1.49 we've extending parse() and read() to take an optional alphabet to cover this exact situation where the file format doesn't specify the alphabet (see [http://bugzilla.open-bio.org/show_bug.cgi?id=2443 enhancement bug 2433]).
  
Bio.Alphabet import generic_dna
+
In the short term, if you need to specify the alphabet, then the best I can suggest is something like this:
records = SeqIO.parse(handle, "fasta")
+
 
 +
<python>Bio.Alphabet import generic_dna
 +
records = list(SeqIO.parse(handle, "fasta"))
 
for record in records :
 
for record in records :
 
     record.seq.alphabet = generic_dna
 
     record.seq.alphabet = generic_dna
record_dict = SeqIO.to_dict(SeqIO.parse(handle, "fasta"))
+
record_dict = SeqIO.to_dict(records)</python>
  
 
Or, perhaps:
 
Or, perhaps:
  
Bio.Alphabet import generic_dna
+
<python>Bio.Alphabet import generic_dna
 
def force_dna(record) :
 
def force_dna(record) :
 
     record.seq.alphabet = generic_dna
 
     record.seq.alphabet = generic_dna
 
     return record
 
     return record
record_dict = SeqIO.to_dict(map(force_dna, SeqIO.parse(handle, "fasta")))
+
record_dict = SeqIO.to_dict(map(force_dna, SeqIO.parse(handle, "fasta")))</python>
  
 
Or, if you like generators and are using Python 2.4 or later:
 
Or, if you like generators and are using Python 2.4 or later:
  
Bio.Alphabet import generic_dna
+
<python>Bio.Alphabet import generic_dna
 
def force_dna(record) :
 
def force_dna(record) :
 
     record.seq.alphabet = generic_dna
 
     record.seq.alphabet = generic_dna
 
     return record
 
     return record
record_dict = SeqIO.to_dict(force_dna(rec) for rec in SeqIO.parse(handle, "fasta")))
+
record_dict = SeqIO.to_dict(force_dna(rec) for rec in SeqIO.parse(handle, "fasta")))</python>
  
 
Why not join the discussion mailing list (see [[mailing lists]]) and tell us what you think?
 
Why not join the discussion mailing list (see [[mailing lists]]) and tell us what you think?

Latest revision as of 09:43, 4 November 2008

Hello everyone,

  • If you want help on using Bio.SeqIO, please join the discussion mailing list (see mailing lists).
  • If you think you've found a bug, please report it on bugzilla.
  • If you have a nice example you'd like to add, I would encourage you to join the discussion mailing list, but posting the code here should be fine.

Thanks,

Peter

New example, Random Fragments

I added a example into the output section, but now I wonder if this is the best place to put it. Maybe it should be under "examples". --Sbassi 01:20, 27 August 2007 (EDT)

Peter Nice idea - and yes, I would have put it under examples too. I've moved it and edited it too - I wanted the style to match the rest of the page, and also I think there was a possible problem in your randomising code where a sequence might run over the end of the record. I opted to simplify the example - I hope you're still happy with it.

EMBL Problem?

Hi, I am a newbie of Biopython. I tested Bio.SeqIO.parse on EMBL formated miRNA.dat from the microRNA Registry. But I got an error. Is the EMBL format supported fully? As:

>>>from Bio import SeqIO
>>>handle = open('/home/liang/Desktop/miRNA.dat','rU')
>>>record_iterator = SeqIO.parse(handle,'embl')
>>>first_record=record_iterator.next()
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "/usr/lib/python2.5/site-packages/Bio/GenBank/Scanner.py", line 410, in parse_records
   record = self.parse(handle)
 File "/usr/lib/python2.5/site-packages/Bio/GenBank/Scanner.py", line 393, in parse
   if self.feed(handle, consumer) :
 File "/usr/lib/python2.5/site-packages/Bio/GenBank/Scanner.py", line 360, in feed
   self._feed_first_line(consumer, self.line)
 File "/usr/lib/python2.5/site-packages/Bio/GenBank/Scanner.py", line 540, in _feed_first_line
   assert len(fields) == 7
AssertionError
  • Questions like this will probably be answered faster if you post them to the BioPython mail list. --Cjfields 16:03, 23 May 2007 (EDT)
  • Peter as Chris suggested, your question would have been noticed much earlier on the mailing list - but filing a bug would have been an better idea (see bugzilla). Could you do that with a bit more information on where the example file came from? Thanks.

How to specify SeqRecord Alphabet

Hi, nice work with this module!

The to_dict() method creates a dictionary with biopython SeqRecord objects.

record_dict = SeqIO.to_dict(SeqIO.parse(handle, "fasta"))

How can I specify which Alphabet do they must have? By default, they have SingleLetterAlphabet.

Chromosome cneo_JEC21:cn-jec21_chr11:
ID: cneo_JEC21:cn-jec21_chr11
Name: cneo_JEC21:cn-jec21_chr11
Desription: cneo_JEC21:cn-jec21_chr11
Seq('CCCCCTAACCCCCTAACCCCCTAACCCCCTAACCCCCTAACCCCTAACCCCCTAACCCTA ...', SingleLetterAlphabet())

User:Peter: The to_dict() method doesn't do anything with alphabets - your question is really about the parse() function. For Biopython 1.49 we've extending parse() and read() to take an optional alphabet to cover this exact situation where the file format doesn't specify the alphabet (see enhancement bug 2433).

In the short term, if you need to specify the alphabet, then the best I can suggest is something like this:

Bio.Alphabet import generic_dna
records = list(SeqIO.parse(handle, "fasta"))
for record in records :
    record.seq.alphabet = generic_dna
record_dict = SeqIO.to_dict(records)

Or, perhaps:

Bio.Alphabet import generic_dna
def force_dna(record) :
    record.seq.alphabet = generic_dna
    return record
record_dict = SeqIO.to_dict(map(force_dna, SeqIO.parse(handle, "fasta")))

Or, if you like generators and are using Python 2.4 or later:

Bio.Alphabet import generic_dna
def force_dna(record) :
    record.seq.alphabet = generic_dna
    return record
record_dict = SeqIO.to_dict(force_dna(rec) for rec in SeqIO.parse(handle, "fasta")))

Why not join the discussion mailing list (see mailing lists) and tell us what you think?

Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox