From bugzilla-daemon at portal.open-bio.org Fri Aug 1 05:41:14 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 1 Aug 2008 05:41:14 -0400
Subject: [Biopython-dev] [Bug 2561] New: SeqRecord format method to get a
string in a given file format
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2561
Summary: SeqRecord format method to get a string in a given file
format
Product: Biopython
Version: Not Applicable
Platform: All
OS/Version: All
Status: NEW
Severity: normal
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk
If you have a SeqRecord, it is sometimes useful to be be able to convert it
into a FASTA format string, or indeed any suitable file format. Note that this
only makes sense for file formats which support a single record, such as
sequential formats like FASTA, GenBank, EMBL, SwissProt, ...
See http://portal.open-bio.org/pipermail/biopython-dev/2008-June/003793.html
PEP 3101 "Advanced String Formatting" describes a new __format__ method for
objects wishing to support the new python format() function in Python 2.6 and
3.0, see http://www.python.org/dev/peps/pep-3101/
In the short term we could expose this functionality as a method named
tostring(), to_string(), to_format() or some other suitable suggestion. Using
tostring() would be consistent with the Bio.Seq.Seq and Bio.Seq.MutableSeq
objects (although those do not take a format argument).
This could be implemented using Bio.SeqIO with a StringIO handle, for example:
######################################
For the SeqRecord class, in Bio/SeqRecord.py
######################################
def tostring(self, format=None) :
"""Returns the record as a string in the specified file format.
If the file format is omitted (default), the sequence itself is
returned as a string.
Otherwise the format should be a lower case string supported by
Bio.SeqIO, which is used to turn the SeqRecord into a string."""
if format :
from StringIO import StringIO
from Bio import SeqIO
handle = StringIO()
SeqIO.write([self], handle, format)
handle.seek(0)
return handle.read()
else :
#Return the sequence as a string
return self.seq.tostring()
############################################
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Aug 1 06:01:49 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 1 Aug 2008 06:01:49 -0400
Subject: [Biopython-dev] [Bug 2561] SeqRecord format method to get a string
in a given file format
In-Reply-To:
Message-ID: <200808011001.m71A1nmN003441@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2561
------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-08-01 06:01 EST -------
We've have several people request this functionality, and I am keen to add
this. I think the only issue is the naming of the function (and any default
behaviour - for example calling the __str__ method if given no format).
P.S. As an obvious extension of this idea, it would make sense to me to add a
similar method to the Alignment object using Bio.AlignIO internally.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Aug 1 06:19:15 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 1 Aug 2008 06:19:15 -0400
Subject: [Biopython-dev] [Bug 2561] SeqRecord format method to get a string
in a given file format
In-Reply-To:
Message-ID: <200808011019.m71AJFmV004950@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2561
------- Comment #2 from mdehoon at ims.u-tokyo.ac.jp 2008-08-01 06:19 EST -------
I would be in favor of adding a .tostring(format) method to the SeqRecord
class.
If I am not mistaken, such a method would make SeqIO.write superfluous:
for record in records:
handle.write(record.tostring(format))
does the same thing as
Bio.SeqIO.write(handle, records, format)
To keep the Biopython API clean, I would therefore suggest to add
record.tostring(format) and to remove SeqIO.write (after properly deprecating
it and having a bunch of releases with SeqIO.write deprecated).
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Aug 1 06:26:05 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 1 Aug 2008 06:26:05 -0400
Subject: [Biopython-dev] [Bug 2561] SeqRecord format method to get a string
in a given file format
In-Reply-To:
Message-ID: <200808011026.m71AQ5jN005452@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2561
------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-08-01 06:26 EST -------
(In reply to comment #2)
> I would be in favor of adding a .tostring(format) method to the SeqRecord
> class.
OK.
> If I am not mistaken, such a method would make SeqIO.write superfluous:
>
> for record in records:
> handle.write(record.tostring(format))
>
> does the same thing as
>
> Bio.SeqIO.write(handle, records, format)
This would do the same thing ONLY for sequential file formats (which admittedly
are the most commonly used ones). It wouldn't work for anything more
structured with a file header/footer (e.g. any XML format, and most alignment
file formats).
> To keep the Biopython API clean, I would therefore suggest to add
> record.tostring(format) and to remove SeqIO.write (after properly deprecating
> it and having a bunch of releases with SeqIO.write deprecated).
I don't think we can or should deprecate Bio.SeqIO.write() for the reason
above.
Peter
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Aug 1 07:38:30 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 1 Aug 2008 07:38:30 -0400
Subject: [Biopython-dev] [Bug 2561] SeqRecord format method to get a string
in a given file format
In-Reply-To:
Message-ID: <200808011138.m71BcUB4008418@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2561
------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2008-08-01 07:38 EST -------
(In reply to comment #2)
> I would be in favor of adding a .tostring(format) method to the SeqRecord
> class.
Were you happy with making the format optional, and defaulting to the full
sequence as a plain string (as in comment 0 of this bug)?
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Aug 1 09:36:35 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 1 Aug 2008 09:36:35 -0400
Subject: [Biopython-dev] [Bug 2561] SeqRecord format method to get a string
in a given file format
In-Reply-To:
Message-ID: <200808011336.m71DaZ75013145@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2561
------- Comment #5 from mdehoon at ims.u-tokyo.ac.jp 2008-08-01 09:36 EST -------
> > If I am not mistaken, such a method would make SeqIO.write superfluous:
> >
> > for record in records:
> > handle.write(record.tostring(format))
> >
> > does the same thing as
> >
> > Bio.SeqIO.write(handle, records, format)
>
> This would do the same thing ONLY for sequential file formats (which admittedly
> are the most commonly used ones). It wouldn't work for anything more
> structured with a file header/footer (e.g. any XML format, and most alignment
> file formats).
I see. Then indeed we still need Bio.SeqIO.write.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Aug 1 09:37:27 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 1 Aug 2008 09:37:27 -0400
Subject: [Biopython-dev] [Bug 2561] SeqRecord format method to get a string
in a given file format
In-Reply-To:
Message-ID: <200808011337.m71DbRs0013216@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2561
------- Comment #6 from mdehoon at ims.u-tokyo.ac.jp 2008-08-01 09:37 EST -------
(In reply to comment #4)
> (In reply to comment #2)
> > I would be in favor of adding a .tostring(format) method to the SeqRecord
> > class.
>
> Were you happy with making the format optional, and defaulting to the full
> sequence as a plain string (as in comment 0 of this bug)?
>
Yes that makes sense.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Aug 1 09:41:44 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 1 Aug 2008 09:41:44 -0400
Subject: [Biopython-dev] [Bug 2446] Comments in CT tags cause
Bio.Sequencing.Ace.ACEParser to fail.
In-Reply-To:
Message-ID: <200808011341.m71DfiEe013502@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2446
------- Comment #5 from mdehoon at ims.u-tokyo.ac.jp 2008-08-01 09:41 EST -------
Some information about these comment blocks from the polyphred developers:
---------------
They are intentional, though I'm not sure they are limited to
Polyphred's tags.
The format that I have typically seen is more like this:
CT{
Contig1 repeat phrap 52 53 555456:555432
COMMENT{
First line.
Second line.
C}
}
Specifically, the CT block always seems to end with the regex '^}$' and
the COMMENT block always ends with '^C}$'. I assume the literal 'C' was
added on the assumption that non-COMMENT-aware parsers would always be
looking for the brace at the beginning of the line. It's not exactly a
C-like, flexible-whitespace format.
In Consed (13.95 Beta; don't ask) adding a tag with a comment produces
this format in the ACE file. I don't know whether this has been changed
in later versions.
Admittedly, the latest Consed documentation does not mention this style.
Since (at least some versions of) Consed produce comments in this style
in addition to Polyphred, I recommend that the BioPython parser be
adjusted to accept either one.
----
I guess we should store any COMMENT block in a new 'comment' member of the ct
class.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Aug 1 10:49:00 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 1 Aug 2008 10:49:00 -0400
Subject: [Biopython-dev] [Bug 2561] SeqRecord format method to get a string
in a given file format
In-Reply-To:
Message-ID: <200808011449.m71En0Fp016936@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2561
------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2008-08-01 10:48 EST -------
Created an attachment (id=981)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=981&action=view)
Patch to Bio/SeqRecord.py and Bio/Align/Generic.py
This is a little different from the above suggestion:
(a) I am calling the method .to_format() rather than .tostring().
I think this makes it clearer that it is intended to give some kind file
format, rather than being a variation on the str(...) functionality. Also,
this name seems to match the planned Python 2.6/3.0 feature fairly well.
We've already labeled the Seq/MutableSeq .tostring() method as "old" and
suggest using str(my_seq) in the documentation instead. To introducing a new
methods for other objects called .tostring() could be seen as a step backwards.
(b) There is no default format.
While for the SeqRecord, using the raw sequence as a string makes a good file
format neutral choice for the sequence there is no obvious choice for the
Alignment object. Defaulting to FASTA format in both cases might make sense.
On the other hand, the new format() functionality in python will default to
using the str() behaviour in the absence of a format:
http://www.python.org/dev/peps/pep-3101/
> For all built-in types, an empty format specification will produce
> the equivalent of str(value). It is recommended that objects
> defining their own format specifiers follow this convention as
> well.
One might argue therefore that if we want a default format, our .to_format()
method should default to calling __str__ if no format is given - but these are
just "pretty print" output for use at the command prompt.
I decided to go with the Zen-of-python rule #2 ("Explicit is better than
implicit" http://www.python.org/dev/peps/pep-0020/ ) as a good justification
for no default format. Also note that Bio.SeqIO and Bio.AlignIO don't have a
default format.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Aug 1 12:09:08 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 1 Aug 2008 12:09:08 -0400
Subject: [Biopython-dev] [Bug 2446] Comments in CT tags cause
Bio.Sequencing.Ace.ACEParser to fail.
In-Reply-To:
Message-ID: <200808011609.m71G98Wg020875@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2446
mdehoon at ims.u-tokyo.ac.jp changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
------- Comment #6 from mdehoon at ims.u-tokyo.ac.jp 2008-08-01 12:09 EST -------
Fixed in CVS.
Please use Ace.read(handle) instead of Ace.ACEParser().parse(handle),
and Ace.parse(handle) instead of Ace.Iterator(handle, Ace.RecordParser()).
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sat Aug 2 08:43:46 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 2 Aug 2008 08:43:46 -0400
Subject: [Biopython-dev] [Bug 2561] SeqRecord format method to get a string
in a given file format
In-Reply-To:
Message-ID: <200808021243.m72Chk6e008101@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2561
------- Comment #8 from biopython-bugzilla at maubp.freeserve.co.uk 2008-08-02 08:43 EST -------
Regarding the __format__ method, see this thread on the dev-mailing list in
June:
http://lists.open-bio.org/pipermail/biopython-dev/2008-June/003816.html
This suggests something like this could be used to support the format()
function in Python 2.6/3.0:
def __format__(self, format_spec=None):
"""Format the SeqRecord into a string.
This method supports the python format() function added in
Python 2.6/3.0. The format_spec should be a lower case
string supported by Bio.SeqIO as an output file format.
See also the to_format() method."""
if format_spec:
return self.to_format(format_spec)
else :
#Follow python convention and default to using __str__
return str(self)
[And similar for the Alignment object]
We can add this new method without causing any problems for older versions of
Python, as they will ignore the new method.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sat Aug 2 08:51:49 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 2 Aug 2008 08:51:49 -0400
Subject: [Biopython-dev] [Bug 2561] SeqRecord format method to get a string
in a given file format
In-Reply-To:
Message-ID: <200808021251.m72CpnJ4009207@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2561
------- Comment #9 from biopython-bugzilla at maubp.freeserve.co.uk 2008-08-02 08:51 EST -------
As a further remark on re-reading Jared's email and PEP 3101, given the string
class is getting a .format method which just calls the .__format__ method,
perhaps we should call our new method just "format" rather than "to_format" (or
"tostring" or any of the other options.
e.g.
from Bio import SeqIO
for record in SeqIO.read(open("ls_orchids.gbk"),"genbank") :
print record #uses __str__ method
print record.format("fasta")
print record.format("tab")
from Bio import AlignIO
align = AlignIO.read(open("example.aln"),"clustal")
print align #uses __str__ method
print align.format("fasta")
print align.format("nexus")
print align.format("clustal")
And, if using python 2.6 or 3.0 you could also do:
print format(record, "fasta") #uses __format__ method
print format(align, "clustal") #uses __format__ method
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sat Aug 2 09:57:18 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 2 Aug 2008 09:57:18 -0400
Subject: [Biopython-dev] [Bug 2561] SeqRecord format method to get a string
in a given file format
In-Reply-To:
Message-ID: <200808021357.m72DvIxN018971@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2561
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #981 is|0 |1
obsolete| |
------- Comment #10 from biopython-bugzilla at maubp.freeserve.co.uk 2008-08-02 09:57 EST -------
Created an attachment (id=982)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=982&action=view)
Patch to Bio/SeqRecord.py and Bio/Align/Generic.py (revised)
Revised patch, using .format() as the method name (with no default), and adding
the __format__ method too (which defaults to calling __str__ if no format is
given).
[Once we settle on the naming, and check this in, I'll also update the unit
tests to use this new functionality]
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sat Aug 2 12:06:23 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 2 Aug 2008 12:06:23 -0400
Subject: [Biopython-dev] [Bug 2561] SeqRecord format method to get a string
in a given file format
In-Reply-To:
Message-ID: <200808021606.m72G6NNV007368@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2561
------- Comment #11 from jflatow at northwestern.edu 2008-08-02 12:06 EST -------
In re-reading the PEP, the string method `.format` does not just call the
`.__format__` method, they actually have two different signatures:
def format(self, *args, **kwargs):
...
and
def __format__(self, format_spec):
...
I still think its reasonable that *ours* should just call `.__format__`, like
the builtin will. However, this might suggest that reversing the function calls
s.t. `.format` calls `.__format__` and all the work is done in the latter might
be better. I think `.__format__` is the true underlying mechanism, whose
behavior is specced out explictly in the PEP, while `.format` is really just a
convenience method, whose signature could potentially change to make it even
more convenient. This will also prevent arguments claiming that it is less
efficient to use 'This is my record {0:fasta}'.format(record) than 'This is my
record {0}'.format(record.format('fasta')), though I doubt that's much of an
issue.
One other small point is that for 2.x, string objects and unicode objects are
different and so the PEP suggests that any implementation of the `.__format__`
method should check the type of the format_spec argument to see if its unicode
or string and return the appropriate type.
(In reply to comment #10)
> Created an attachment (id=982)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=982&action=view) [details]
> Patch to Bio/SeqRecord.py and Bio/Align/Generic.py (revised)
>
> Revised patch, using .format() as the method name (with no default), and adding
> the __format__ method too (which defaults to calling __str__ if no format is
> given).
>
> [Once we settle on the naming, and check this in, I'll also update the unit
> tests to use this new functionality]
>
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From sbassi at gmail.com Sun Aug 3 18:35:56 2008
From: sbassi at gmail.com (Sebastian Bassi)
Date: Sun, 3 Aug 2008 19:35:56 -0300
Subject: [Biopython-dev] Name problem in BLAST parser?
Message-ID:
Hello,
>>> from Bio.Blast import NCBIXML
>>> blast_records = NCBIXML.parse(res)
>>> record = blast_records.next()
>>> record.database_length
>>> record.num_letters_in_database
39588516
So if we are going to retrieve the database length field, why call it
num_letters_in_database? I guess that the reply is: This field is
called '' in the XML but most people know it as
'Number of letters in database' as it is displayed in the HTML BLAST
output.
Thats OK but why having an empty "database_length" attribute?
I am thinking in two solutions for this:
1) Just delete the "database_length" attribute.
2) Make "database_length" another name for "num_letters_in_database".
Maybe there is another solution that I am not aware of.
Best,
SB.
--
Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6
Bioinformatics news: http://www.bioinformatica.info
Tutorial libre de Python: http://tinyurl.com/2az5d5
From bugzilla-daemon at portal.open-bio.org Mon Aug 4 05:12:41 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 4 Aug 2008 05:12:41 -0400
Subject: [Biopython-dev] [Bug 2558] Bio.Nexus chokes on TRANSLATE block with
superfluous comma
In-Reply-To:
Message-ID: <200808040912.m749CfTp016990@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2558
fkauff at biologie.uni-kl.de changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |RESOLVED
Resolution| |INVALID
------- Comment #3 from fkauff at biologie.uni-kl.de 2008-08-04 05:12 EST -------
... as nobody objected and/or brought an additional example of 'trailing
commas', I close this bug.
Frank
(In reply to comment #2)
> I'm all for a little bit of slack in parsers, but this looks in my opinion like
> a straightforward syntax error in the nexus file. I work with nexus files
> daily, and have never encountered such a trailing comma. What really confuses
> me is that there are 58 taxa in the data set, and no. 59 Lecanorales is in
> addition, with no data and no occurence in the tree. I don't think this is
> proper nexus format.
>
> Frank
>
>
>
> (In reply to comment #1)
> > This is an issue in the Bio.Nexus module, so its a job for Frank.
> >
> > Do you know if this affects all the NEXUS files from www.treebase.org? I've
> > tried downloading several trees, but their FTP site is just timing out for me.
> > According to http://www.treebase.org/treebase/submit.html the request trees be
> > uploaded in the NEXUS file format so its possible that just a minority of their
> > trees have this trailing comma.
> >
> > Note that this may be an invalid file (a TRANSLATE block with trailing comma),
> > but as you say it looks relatively straight forward to cope with. However, I
> > have had a quick look at the Bio.Nexus code, and I don't entirely understand
> > what Frank's parser is doing here - so its not going to be a quick fix from me.
> >
> >
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython at maubp.freeserve.co.uk Mon Aug 4 05:49:50 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 4 Aug 2008 10:49:50 +0100
Subject: [Biopython-dev] Name problem in BLAST parser?
In-Reply-To:
References:
Message-ID: <320fb6e00808040249q26f65087q9f7d3c07d47d1c1b@mail.gmail.com>
On Sun, Aug 3, 2008 at 11:35 PM, Sebastian Bassi wrote:
> Hello,
>
>>>> from Bio.Blast import NCBIXML
>>>> blast_records = NCBIXML.parse(res)
>>>> record = blast_records.next()
>>>> record.database_length
>>>> record.num_letters_in_database
> 39588516
>
> So if we are going to retrieve the database length field, why call it
> num_letters_in_database? I guess that the reply is: This field is
> called '' in the XML but most people know it as
> 'Number of letters in database' as it is displayed in the HTML BLAST
> output.
Good question. I think the name was picked in the plain text parser,
and maintained in the XML parser. However, things are more
complicated...
Strangely the plain text BLAST format contains this information three
times (!), once in the header (for each query) and then again at the
end of the file in the database report and the parameters "total
letters" and again as "length of database", e.g.
http://bugzilla.open-bio.org/attachment.cgi?id=676
...
Database: Leigo
4,535,438 sequences; 1,573,298,872 total letters
...
Database: Leigo
Posted date: Jan 22, 2007 11:26 AM
Number of letters in database: 1,573,298,872
Number of sequences in database: 4,535,438
...
Length of database: 1,573,298,872
...
The Bio.Record.Header class defines "database_letters" (this is
repeated every query), Bio.Record.DatabaseReport defines
"num_letters_in_database", and Bio.Record.Parameters class defines
"database_length" (where the names reflect the NCBI strings). The
Bio.Record.Record inherits from all three, so ends up with
"database_letters", "database_length" and "num_letters_in_database"
(all coming from different bits of a plain text BLAST file). I am
assuming that these three numbers should agree, but the design allows
for the fact they may not (I would have used a single name and checked
they were the same).
>From a quick check, in the XML output the database length is found
only in the statistics block (repeated for each query), as you stated,
called ''. As this is per-query, the closest match
to the original trio is the one in each query's header,
"database_letters", but instead in the initial XML parser this was
mapped to "num_letters_in_database".
> Thats OK but why having an empty "database_length" attribute?
> I am thinking in two solutions for this:
> 1) Just delete the "database_length" attribute.
> 2) Make "database_length" another name for "num_letters_in_database".
> Maybe there is another solution that I am not aware of.
Regarding idea (2), as the plain text parser fills in both
"num_letters_in_database" and "database_length" and
"database_letters" (from different parts of the file), I think for
consistency one could argue that the XML parser should also fill in
all three! On the other hand, having the same information in three
places is crazy and un-pythonic.
In the long run perhaps we should deprecate the "database_length" and
"database_letters" properties of the Record class (and just make the
plain text parser just check all three agree)? This is a variation on
your idea (1).
Peter
From biopython at maubp.freeserve.co.uk Mon Aug 4 05:54:33 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 4 Aug 2008 10:54:33 +0100
Subject: [Biopython-dev] Name problem in BLAST parser?
In-Reply-To: <320fb6e00808040249q26f65087q9f7d3c07d47d1c1b@mail.gmail.com>
References:
<320fb6e00808040249q26f65087q9f7d3c07d47d1c1b@mail.gmail.com>
Message-ID: <320fb6e00808040254o5112491fpf9d515f16e543cf0@mail.gmail.com>
>> Thats OK but why having an empty "database_length" attribute?
>> I am thinking in two solutions for this:
>> 1) Just delete the "database_length" attribute.
>> 2) Make "database_length" another name for "num_letters_in_database".
>> Maybe there is another solution that I am not aware of.
>
> Regarding idea (2), as the plain text parser fills in both
> "num_letters_in_database" and "database_length" and
> "database_letters" (from different parts of the file), I think for
> consistency one could argue that the XML parser should also fill in
> all three! On the other hand, having the same information in three
> places is crazy and un-pythonic.
>
> In the long run perhaps we should deprecate the "database_length" and
> "database_letters" properties of the Record class (and just make the
> plain text parser just check all three agree)? This is a variation on
> your idea (1).
Actually, until four months ago the XML parser didn't publicly expose
the database length at all, until as you pointed out I changed the
private record._num_letters_in_database to
record.num_letters_in_database - so maybe we could standardise on the
more natural "database_length" instead?
Peter
From biopython at maubp.freeserve.co.uk Mon Aug 4 06:30:46 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 4 Aug 2008 11:30:46 +0100
Subject: [Biopython-dev] Name problem in BLAST parser?
In-Reply-To: <320fb6e00808040249q26f65087q9f7d3c07d47d1c1b@mail.gmail.com>
References:
<320fb6e00808040249q26f65087q9f7d3c07d47d1c1b@mail.gmail.com>
Message-ID: <320fb6e00808040330g3d91494bue3f1321272eb7a5@mail.gmail.com>
> Strangely the plain text BLAST format contains this information three
> times (!), once in the header (for each query) and then again at the
> end of the file in the database report and the parameters "total
> letters" and again as "length of database", e.g.
> http://bugzilla.open-bio.org/attachment.cgi?id=676
>
> ...
> Database: Leigo
> 4,535,438 sequences; 1,573,298,872 total letters
> ...
> Database: Leigo
> Posted date: Jan 22, 2007 11:26 AM
> Number of letters in database: 1,573,298,872
> Number of sequences in database: 4,535,438
> ...
> Length of database: 1,573,298,872
> ...
At the suggestion of Leighton (off list) I checked out the -z option
and what this does to the reported database length.
If the -z option is used, only the last of these three databases in
the plain text output is changed (tested using standalone BLAST
2.2.18, which Biopython can parse for single queries). Using the
Biopython plain text parser, "database_letters" and
"num_letters_in_database" reflect the real database size, while
"database_length" reflects the -z argument (which is used in the
statistics). My naive assumption that the three value would always be
the same has been invalidated.
If the -z option is used with XML output, then is
updated. As far as I can tell, the "real" database size is not
reported. This suggests to match the old plain text parser, the field
should have been called "database_length" when parsing the XML.
Peter
From bugzilla-daemon at portal.open-bio.org Mon Aug 4 11:26:06 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 4 Aug 2008 11:26:06 -0400
Subject: [Biopython-dev] [Bug 2294] Writing GenBank files with Bio.SeqIO
In-Reply-To:
Message-ID: <200808041526.m74FQ67S002151@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2294
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
AssignedTo|howard.salis at gmail.com |biopython-dev at biopython.org
Status|ASSIGNED |NEW
------- Comment #12 from biopython-bugzilla at maubp.freeserve.co.uk 2008-08-04 11:26 EST -------
Re-assigning back to the dev mailing list.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Aug 4 11:59:27 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 4 Aug 2008 11:59:27 -0400
Subject: [Biopython-dev] [Bug 2553] Adding SeqRecord objects to an alignment
(append or extend)
In-Reply-To:
Message-ID: <200808041559.m74FxRpH003851@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2553
------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-08-04 11:59 EST -------
See also Bugzilla Bug 2554, Creating an Alignment from a list of SeqRecord
objects
There is some outline code there which also covers append and extend methods.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Aug 4 12:34:09 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 4 Aug 2008 12:34:09 -0400
Subject: [Biopython-dev] [Bug 2294] Writing GenBank files with Bio.SeqIO
In-Reply-To:
Message-ID: <200808041634.m74GY9A8005566@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2294
------- Comment #13 from howard.salis at gmail.com 2008-08-04 12:34 EST -------
Sure, that sounds great.
(In reply to comment #11)
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Aug 7 05:54:20 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 7 Aug 2008 05:54:20 -0400
Subject: [Biopython-dev] [Bug 2561] SeqRecord format method to get a string
in a given file format
In-Reply-To:
Message-ID: <200808070954.m779sK0p021333@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2561
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #982 is|0 |1
obsolete| |
------- Comment #12 from biopython-bugzilla at maubp.freeserve.co.uk 2008-08-07 05:54 EST -------
(From update of attachment 982)
I've just checked in something based on this patch, but as suggested by Jared I
moved the actual code from format to __format__ instead.
(Quoting Jared, comment #11)
> I still think its reasonable that *ours* should just call `.__format__`,
> like the builtin will. However, this might suggest that reversing the
> function calls s.t. `.format` calls `.__format__` and all the work is
> done in the latter might be better. I think `.__format__` is the true
> underlying mechanism, whose behavior is specced out explictly in the PEP,
> while `.format` is really just a convenience method, whose signature could
> potentially change to make it even more convenient.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Aug 7 08:02:39 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 7 Aug 2008 08:02:39 -0400
Subject: [Biopython-dev] [Bug 2564] New: Bio.Clustalw parsing fails if
CLUSTAL files omit version number
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2564
Summary: Bio.Clustalw parsing fails if CLUSTAL files omit version
number
Product: Biopython
Version: Not Applicable
Platform: All
OS/Version: All
Status: NEW
Severity: normal
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk
This bug report is based on an email discussion with Nick Matzke on the mailing
list.
Some third party tools like PROMALS3D can mimic the CLUSTAL file format, e.g.
http://prodata.swmed.edu/promals3d/promals3d.php
In this case the first line reads:
CLUSTAL format multiple sequence alignment by PROMALS3D
rather than for example:
CLUSTAL W (1.81) multiple sequence alignment
CLUSTAL W (1.83) multiple sequence alignment
CLUSTAL 2.0.9 multiple sequence alignment
In my testing using Bio.AlignIO directly, the parser is happy with the
PROMALS3D output - but naturally cannot record a version number. However,
parsing via Bio.Clustalw expects the version number and fails (testing on CVS):
>>> from Bio import Clustalw
>>> from Bio.Alphabet import IUPAC, Gapped
>>> a = Clustalw.parse_file("promals3d.aln", Gapped(IUPAC.protein,"-"))
Traceback (most recent call last):
File "", line 1, in
File
"/Users/xxx/repositories/biopython/build/lib.macosx-10.3-i386-2.5/Bio/Clustalw/__init__.py",
line 59, in parse_file
clustal_alignment._version = generic_alignment._version
AttributeError: Alignment instance has no attribute '_version'
I'll upload an example PROMALS3D alignment file, and look into fixing this.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Aug 7 08:04:03 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 7 Aug 2008 08:04:03 -0400
Subject: [Biopython-dev] [Bug 2564] Bio.Clustalw parsing fails if CLUSTAL
files omit version number
In-Reply-To:
Message-ID: <200808071204.m77C435N026522@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2564
------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-08-07 08:04 EST -------
Created an attachment (id=984)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=984&action=view)
CLUSTAL format alignment from PROMALS3D
Example file using a block size of 70 (this is an option on the PROMALS3D
website).
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Aug 7 08:29:17 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 7 Aug 2008 08:29:17 -0400
Subject: [Biopython-dev] [Bug 2564] Bio.Clustalw parsing fails if CLUSTAL
files omit version number
In-Reply-To:
Message-ID: <200808071229.m77CTHTL028003@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2564
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-08-07 08:29 EST -------
OK, this is fixed in CVS now. See file Bio/Clustalw/__init__.py revision 1.21
http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Clustalw/__init__.py?cvsroot=biopython
I've also updated the unit tests, test_Clustalw.py and test_AlignIO.py to
include this PROMALS3D alignment as an input file.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Aug 7 10:42:13 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 7 Aug 2008 10:42:13 -0400
Subject: [Biopython-dev] [Bug 2508] NCBIStandalone.blastall: provide support
for '-F F' and make it safe
In-Reply-To:
Message-ID: <200808071442.m77EgDVH001310@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2508
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|REOPENED |RESOLVED
Resolution| |FIXED
------- Comment #13 from biopython-bugzilla at maubp.freeserve.co.uk 2008-08-07 10:42 EST -------
CVS will now also reject command options containing the "<", ">" or "|"
characters.
Checking in Bio/Blast/NCBIStandalone.py;
/home/repository/biopython/biopython/Bio/Blast/NCBIStandalone.py,v <--
NCBIStandalone.py
new revision: 1.75; previous revision: 1.74
done
Checking in Tests/test_NCBIStandalone.py;
/home/repository/biopython/biopython/Tests/test_NCBIStandalone.py,v <--
test_NCBIStandalone.py
new revision: 1.16; previous revision: 1.15
done
Marking this bug as fixed.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Aug 8 07:09:32 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 8 Aug 2008 07:09:32 -0400
Subject: [Biopython-dev] [Bug 2535] Support for PIR / NBRF format in
Bio.SeqIO
In-Reply-To:
Message-ID: <200808081109.m78B9W0h023052@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2535
------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-08-08 07:09 EST -------
NBRF = National Biomedical Research Foundation
PIR = Protein Information Resource, a major project of the NBRF.
See http://pir.georgetown.edu/pirwww/dbinfo/pir_psd.shtml
> Database Description for PIR-PSD
>
> Release 80.00 (31 Dec 2004) is the final release for the
> PIR-International Protein Sequence Database (PIR-PSD), the
> world's first database of classified and functionally
> annotated protein sequences that grew out of the Atlas of
> Protein Sequence and Structure (1965-1978) edited by
> Margaret Dayhoff. Produced and distributed by the Protein
> Information Resource in collaboration with MIPS (Munich
> Information Center for Protein Sequences) and JIPID (Japan
> International Protein Information Database), PIR-PSD has
> been the most comprehensive and expertly-curated protein
> sequence database in the public domain for over 20 years.
> In 2002, PIR joined EBI (European Bioinformatics Institute)
> and SIB (Swiss Institute of Bioinformatics) to form the
> UniProt consortium. PIR-PSD sequences and annotations have
> been integrated into UniProt Knowledgebase. Bi-directional
> cross-references between UniProt (UniProt Knowledgebase
> and/or UniParc) and PIR-PSD are established to allow easy
> tracking of former PIR-PSD entries. PIR-PSD unique
> sequences, reference citations, and experimentally-verified
> data can now be found in the relevant UniProt records.
Given the PIR database itself is now part of UniProt, I wonder if there is
actually much need for the PIR/NBRF format any more?
Perhaps reading PIR files might still be useful, but I doubt there is any need
to support writing PIR files with Bio.SeqIO now.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython at maubp.freeserve.co.uk Fri Aug 8 07:17:04 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 8 Aug 2008 12:17:04 +0100
Subject: [Biopython-dev] Modules to be removed from Biopython
In-Reply-To: <320fb6e00807220913g64613854j7a1deb5b4357f726@mail.gmail.com>
References: <492634.64872.qm@web62414.mail.re1.yahoo.com>
<320fb6e00806270950k479eda23ia96d3c2d36557510@mail.gmail.com>
<320fb6e00807160540w325fe995mea400b0014fd7c2e@mail.gmail.com>
<320fb6e00807220913g64613854j7a1deb5b4357f726@mail.gmail.com>
Message-ID: <320fb6e00808080417y483f74c8xd94dd7ca9eea0476@mail.gmail.com>
On Jul 22, 2008 Peter wrote:
> Bio.expressions was already deprecated, and seems to be a dependency
> of the following modules, which I have now explicitly deprecated in
> CVS:
>
> Bio.expressions (deprecated in Biopython 1.44)
> Bio.config
> Bio.dbdefs
> Bio.formatdefs
I suggest we leave these four modules (Bio.expressions, Bio.config,
Bio.dbdefs and Bio.formatdefs) in for one more release, and then
remove them.
> Moving on, Bio.Std and Bio.StdHandler appear to be used by:
> - Bio.expressions (deprecated in Biopython 1.44)
> - Bio.config (now deprecated in CVS)
> - Bio.builders (used by Mindy)
> - Bio.Mindy (used by Bio.config which is now deprecated)
>
> As far as I can tell, other historic usage of Mindy (e.g. in Bio.Fasta
> and Bio.GenBank) has already been deprecated and removed. I think it
> would therefore also be safe to deprecate these four together
> (Bio.expressions, Bio.config, Bio.builders and Bio.Mindy), or start by
> deprecating Bio.Mindy on its own.
I would like to deprecate Bio.Mindy and Bio.builders for the next
release. Are there any comments - otherwise I'll post a message on
the main discussion list to check no-one objects there. Should we
push for even more "spring cleaning" for the next release? Another
candidate for deprecation is Bio.Decode, which doesn't seem to be used
by any of Biopython.
Peter
From bugzilla-daemon at portal.open-bio.org Fri Aug 8 07:37:41 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 8 Aug 2008 07:37:41 -0400
Subject: [Biopython-dev] [Bug 2535] Support for PIR / NBRF format in
Bio.SeqIO
In-Reply-To:
Message-ID: <200808081137.m78BbfXF024489@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2535
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2008-08-08 07:37 EST -------
Marking as fixed - I have checked in support for reading PIR format with
Bio.SeqIO (a cut down version of the above code). If anyone really wants to
support writing to PIR format that could be added later.
Checking in Tests/test_SeqIO.py;
/home/repository/biopython/biopython/Tests/test_SeqIO.py,v <-- test_SeqIO.py
new revision: 1.38; previous revision: 1.37
done
Checking in Tests/test_AlignIO.py;
/home/repository/biopython/biopython/Tests/test_AlignIO.py,v <--
test_AlignIO.py
new revision: 1.15; previous revision: 1.14
done
Checking in Tests/output/test_SeqIO;
/home/repository/biopython/biopython/Tests/output/test_SeqIO,v <-- test_SeqIO
new revision: 1.28; previous revision: 1.27
done
Checking in Tests/output/test_AlignIO;
/home/repository/biopython/biopython/Tests/output/test_AlignIO,v <--
test_AlignIO
new revision: 1.13; previous revision: 1.12
done
Checking in Bio/SeqIO/__init__.py;
/home/repository/biopython/biopython/Bio/SeqIO/__init__.py,v <-- __init__.py
new revision: 1.37; previous revision: 1.36
done
RCS file: /home/repository/biopython/biopython/Bio/SeqIO/PirIO.py,v
done
Checking in Bio/SeqIO/PirIO.py;
/home/repository/biopython/biopython/Bio/SeqIO/PirIO.py,v <-- PirIO.py
initial revision: 1.1
done
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython at maubp.freeserve.co.uk Fri Aug 8 11:56:16 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 8 Aug 2008 16:56:16 +0100
Subject: [Biopython-dev] Deprecating Bio.EUtils?
Message-ID: <320fb6e00808080856r1d8021dbhfd6192a48718afcd@mail.gmail.com>
The NCBI Entrez database has some "Entrez Programming Utilities"
called EUtils, http://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html
Biopython now has two modules for accessing this, Bio.Entrez (new) and
Bio.EUtils (old).
We have a couple of open bugs in Bio.EUtils,
http://bugzilla.open-bio.org/show_bug.cgi?id=2447
http://bugzilla.open-bio.org/show_bug.cgi?id=2448
This module is currently unmaintained, and the original author Andrew
Dalke has indicated that he doesn't have the time to look after it.
The Bio.EUtils code is also (in my opinion) fairly complicated for any
newcomer to try and understand. On the bright side, it does have a
working unit test.
Given most (all?) of the NBCI's "Entrez Programming Utilities" aka
EUtils functionality is now supported by the newer (and much simpler)
Bio.Entrez module (which Michiel is maintaining), perhaps we should
deprecate Bio.EUtils for the next release.
Peter
From biopython at maubp.freeserve.co.uk Sat Aug 9 09:25:19 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sat, 9 Aug 2008 14:25:19 +0100
Subject: [Biopython-dev] [BioPython] Bio.Medline parser
In-Reply-To: <576584.32208.qm@web62405.mail.re1.yahoo.com>
References: <320fb6e00808081032t1bf11e6fv32ed32ca4669cae5@mail.gmail.com>
<576584.32208.qm@web62405.mail.re1.yahoo.com>
Message-ID: <320fb6e00808090625l64851a0am8e56563aec16ba66@mail.gmail.com>
On the main mailing list Michiel and I wrote:
>> That sounds sensible. Maybe we should have an example
>> in the Tutorial of using Bio.Entrez to download some data
>> in the plain text MedLine format, and parsing it with
>> Bio.MedLine? And perhaps also an equivalent using the
>> XML Medline format parsed using Bio.Entrez?
>
> Done -- see CVS.
Great - I've updated CVS slightly (missing line break in one example,
and I expanded on the section introduction to mention the rettype and
retmode paramters for efetch explicitly).
There is an older PubMed/Medline example in the cookbook section which
uses the Bio.PubMed.Dictionary class. The Bio.PubMed.Dictionary
expects the old (now deprecated) parser from Bio.Medline, and also as
we discussed last month, these dictionary interfaces to the NCBI do
encourage the user to make a series of separate calls to Entrez
(rather than doing linked a search/retreive with the web-history).
Perhaps Bio.PubMed.Dictionary should also be deprecated? Indeed could
the whole of Bio.PubMed be deprecated in favour of Bio.Entrez?
The Bio.GenBank.NCBIDictionary class is a similar case, but here the
underlying GenBank parser framework is still present.
Peter
From bugzilla-daemon at portal.open-bio.org Sat Aug 9 10:20:16 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 9 Aug 2008 10:20:16 -0400
Subject: [Biopython-dev] [Bug 2561] SeqRecord format method to get a string
in a given file format
In-Reply-To:
Message-ID: <200808091420.m79EKGP9018151@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2561
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
------- Comment #13 from biopython-bugzilla at maubp.freeserve.co.uk 2008-08-09 10:20 EST -------
I'ved updated the tutorial to describe this new functionality briefly.
Checking in Tutorial.tex;
/home/repository/biopython/biopython/Doc/Tutorial.tex,v <-- Tutorial.tex
new revision: 1.139; previous revision: 1.138
done
I'm marking this bug a fixed now, but until the next release of Biopython, I am
still willing to discuss changing the new .format() method's name.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sat Aug 9 14:53:16 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sat, 9 Aug 2008 14:53:16 -0400
Subject: [Biopython-dev] [Bug 2544] Bio.GenBank and SeqFeature improvements
In-Reply-To:
Message-ID: <200808091853.m79IrGW2028835@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2544
------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-08-09 14:53 EST -------
I have updated Bio/SeqFeature.py in CVS to add __repr__ methods to the
SeqFeature object and the locations.
Checking in SeqFeature.py;
/home/repository/biopython/biopython/Bio/SeqFeature.py,v <-- SeqFeature.py
new revision: 1.12; previous revision: 1.11
done
As an example of the sort of output you now get:
>>> from Bio import SeqIO
>>> record = SeqIO.read(open("AE017199.gbk"), "genbank")
>>> record.features[0]
Bio.SeqFeature.SeqFeature(Bio.SeqFeature.FeatureLocation(Bio.SeqFeature.ExactPosition(0),Bio.SeqFeature.ExactPosition(490885)),
type='source', strand=1)
>>> print record.features[0]
type: source
location: [0:490885]
ref: None:None
strand: 1
qualifiers:
Key: db_xref, Value: ['taxon:228908']
Key: mol_type, Value: ['genomic DNA']
Key: organism, Value: ['Nanoarchaeum equitans Kin4-M']
>>> print record.features[-1]
type: CDS
location: [486422:486962]
ref: None:None
strand: -1
qualifiers:
Key: codon_start, Value: ['1']
Key: db_xref, Value: ['GI:40069056']
Key: locus_tag, Value: ['NEQ550']
Key: note, Value: ['hypothetical protein']
Key: product, Value: ['NEQ550']
Key: protein_id, Value: ['AAR39391.1']
Key: transl_table, Value: ['11']
Key: translation, Value:
['MLELLAGFKQSILYVLAQFKKPEYATSYTIKLVNPFYYISDSLNVITSTKEDKVNYKVSLSDIAFDFPFKFPIVAIVEGKANREFTFIIDRQNKKLSYDLKKGIIYIQDATIIPNGIKITVNGLAELKNIKINPNDPSITVQKVVGEQNTYIIKTSKDSVKITISADFVVKAEKWLFIQ']
>>> record.features[-1]
Bio.SeqFeature.SeqFeature(Bio.SeqFeature.FeatureLocation(Bio.SeqFeature.ExactPosition(486422),Bio.SeqFeature.ExactPosition(486962)),
type='CDS', strand=-1)
I have also updated Bio/SeqRecord.py so that the __str__ method of the
SeqRecord reports the number of features,
/home/repository/biopython/biopython/Bio/SeqRecord.py,v <-- SeqRecord.py
new revision: 1.19; previous revision: 1.18
done
For example,
>>> from Bio import SeqIO
>>> record = SeqIO.read(open("AE017199.gbk"), "genbank")
>>> print record
ID: AE017199.1
Name: AE017199
Description: Nanoarchaeum equitans Kin4-M, complete genome.
Number of features: 1107
/comment=On Dec 18, 2003 this sequence version replaced gi:37777680.
/sequence_version=1
/source=Nanoarchaeum equitans Kin4-M
/taxonomy=['Archaea', 'Nanoarchaeota', 'Nanoarchaeum']
/keywords=['']
/references=[,
]
/accessions=['AE017199', 'AACL01000000', 'AACL01000001']
/data_file_division=BCT
/date=22-DEC-2003
/organism=Nanoarchaeum equitans Kin4-M
/gi=40068520
Seq('TCTCGCAGAGTTCTTTTTTGTATTAACAAACCCAAAACCCATAGAATTTAATGA...TTA',
IUPACAmbiguousDNA())
Still to do: Defining __repr__ for the Bio.SeqFeature.Reference object (and
perhaps tweaking the display of the references in the SeqRecord __str__
method).
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sun Aug 10 01:25:50 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 10 Aug 2008 01:25:50 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To:
Message-ID: <200808100525.m7A5Po18016256@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2454
------- Comment #29 from mdehoon at ims.u-tokyo.ac.jp 2008-08-10 01:25 EST -------
The Medline parser was further updated; see
http://lists.open-bio.org/pipermail/biopython/2008-August/004385.html
on the mailing list.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From mjldehoon at yahoo.com Sun Aug 10 01:23:55 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Sat, 9 Aug 2008 22:23:55 -0700 (PDT)
Subject: [Biopython-dev] [BioPython] Bio.Medline parser
In-Reply-To: <320fb6e00808090625l64851a0am8e56563aec16ba66@mail.gmail.com>
Message-ID: <635812.43918.qm@web62405.mail.re1.yahoo.com>
--- On Sat, 8/9/08, Peter wrote:
> Indeed could the whole of Bio.PubMed be deprecated in favour of
> Bio.Entrez?
I think Bio.PubMed can be deprecated if the tutorial shows some clear examples of how to achieve the same functionality with Bio.Entrez.
>
> The Bio.GenBank.NCBIDictionary class is a similar case, but
> here the underlying GenBank parser framework is still present.
>
Bio.GenBank.NCBIDictionary should be deprecated, as it encourages accessing Genbank in a way inconsistent with NCBI's guidelines.
I could take sections 9.1 and 9.2 in the tutorial and convert it to examples for the Bio.Entrez chapter (section 7.10).
--Michiel.
From bugzilla-daemon at portal.open-bio.org Sun Aug 10 03:07:05 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 10 Aug 2008 03:07:05 -0400
Subject: [Biopython-dev] [Bug 2454] Iterators can't use file-like objects
In-Reply-To:
Message-ID: <200808100707.m7A775CZ019318@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2454
mdehoon at ims.u-tokyo.ac.jp changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |RESOLVED
Resolution| |FIXED
------- Comment #30 from mdehoon at ims.u-tokyo.ac.jp 2008-08-10 03:07 EST -------
All parsers have now been updated; closing this bug.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython at maubp.freeserve.co.uk Tue Aug 12 10:30:57 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 12 Aug 2008 15:30:57 +0100
Subject: [Biopython-dev] Deprecating Bio.EUtils?
In-Reply-To: <320fb6e00808080856r1d8021dbhfd6192a48718afcd@mail.gmail.com>
References: <320fb6e00808080856r1d8021dbhfd6192a48718afcd@mail.gmail.com>
Message-ID: <320fb6e00808120730x1be5572ck72cefeaad90ef37e@mail.gmail.com>
On Fri, Aug 8, 2008 at 4:56 PM, Peter wrote:
> The NCBI Entrez database has some "Entrez Programming Utilities"
> called EUtils, http://eutils.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html
>
> Biopython now has two modules for accessing this, Bio.Entrez (new) and
> Bio.EUtils (old).
>
> We have a couple of open bugs in Bio.EUtils,
> http://bugzilla.open-bio.org/show_bug.cgi?id=2447
> http://bugzilla.open-bio.org/show_bug.cgi?id=2448
>
> This module is currently unmaintained, and the original author Andrew
> Dalke has indicated that he doesn't have the time to look after it.
> The Bio.EUtils code is also (in my opinion) fairly complicated for any
> newcomer to try and understand. On the bright side, it does have a
> working unit test.
>
> Given most (all?) of the NBCI's "Entrez Programming Utilities" aka
> EUtils functionality is now supported by the newer (and much simpler)
> Bio.Entrez module (which Michiel is maintaining), perhaps we should
> deprecate Bio.EUtils for the next release.
Assuming no one has any comments, I'll post a similar email on the
main list to see if anyone there is currently using Bio.EUtils.
Peter
From biopython at maubp.freeserve.co.uk Wed Aug 13 07:23:17 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 13 Aug 2008 12:23:17 +0100
Subject: [Biopython-dev] Removing the deprecated Bio.WWW modules
In-Reply-To: <320fb6e00807250421w15b1d8a9qe9d5d178c233ec7b@mail.gmail.com>
References: <320fb6e00807240441g5b21993dl7c84aebac0e2a988@mail.gmail.com>
<502434.4415.qm@web62406.mail.re1.yahoo.com>
<320fb6e00807250421w15b1d8a9qe9d5d178c233ec7b@mail.gmail.com>
Message-ID: <320fb6e00808130423j31b1b7b7wc85089a374e097a6@mail.gmail.com>
>> Note that Bio.WWW.__init__.py contains some code that is used in other modules.
>> Most (but not all) of these modules are deprecated themselves. For the
>> non-deprecated modules, it's probably easiest to just copy the code from
>> Bio.WWW.__init__.py over to avoid having to import Bio.WWW.
>
> Good catch - I didn't do my recursive grep correctly. The file
> Bio/WWW/__init__.py just contains a RequestLimiter class, and this is
> currently used in:
>
> Bio/Blast/NCBIWWW.py (used in qblast, simple to recode as in Bio.Entrez)
> Bio/config/_support.py (completely deprecated)
> Bio/Prosite/__init__.py (in the deprecated ExPASyDictionary class)
> Bio/SwissProt/SProt.py (in the deprecated ExPASyDictionary class)
I updated Bio/Blast/NCBIWWW.py in CVS revision 1.52 not to use
Bio.WWW, meaning that the Bio.WWW modules are no longer used by any
current Biopython modules. I then added a deprecation warning to
Bio/WWW/__init__.py in CVS revision 1.9
> Note I have just updated Bio.Prosite and Bio.SwissProt to use
> Bio.ExPASy rather than Bio.WWW.ExPASy which means we can delete the
> deprecated Bio/WWW/ExPASy.py, InterPro.py, NCBI.py and SCOP.py now.
I have also removed the deprecated four files Bio/WWW/ExPASy.py,
InterPro.py, NCBI.py and SCOP.py
Peter
From biopython at maubp.freeserve.co.uk Wed Aug 13 08:44:36 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 13 Aug 2008 13:44:36 +0100
Subject: [Biopython-dev] Biopython to begin transition to Subversion
In-Reply-To: <20080213100047.GA18695@inb.uni-luebeck.de>
References: <320fb6e00802110955s57cba8c4p3e0a9fc9f9bff7e7@mail.gmail.com>
<473614.54149.qm@web62415.mail.re1.yahoo.com>
<320fb6e00802130119t6c95bd28u22b94ecfebfd3ad9@mail.gmail.com>
<20080213100047.GA18695@inb.uni-luebeck.de>
Message-ID: <320fb6e00808130544h51c3a219xc333d7d080a01faf@mail.gmail.com>
>> How strange - for me it asked for my password three times (this was
>> the issue I had emailed Chris about directly; also establishing that
>> yes, the same accounts and passwords were being used as in CVS). I
>> hope they can sort this out...
>
> That's about normal and just the way svn works. I don't know the
> details, but AFAIK svn connects multiple times to the repo: Version
> Checking for changes, downloading data, etc. -- every operation needs
> a separate authentication. Quite stupid, in a way.
>
> You might want to generate a public key and add it to the
> ~/.ssh/authorized_keys file on the server - you won't be asked for a
> password any more.
I just stumbled on an OBF wiki page which goes into a little more
detail about why there are multiple password prompts:
http://www.open-bio.org/wiki/SVN-Developers
Peter
From bugzilla-daemon at portal.open-bio.org Wed Aug 13 12:01:31 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 13 Aug 2008 12:01:31 -0400
Subject: [Biopython-dev] [Bug 2469] requires_wise.py fails on Windows (test
suite)
In-Reply-To:
Message-ID: <200808131601.m7DG1VwY027024@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2469
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-08-13 12:01 EST -------
Fixed by updating Tests/requires_wise.py in CVS revision 1.4 to fail gracefully
on Windows.
This is a pragmatic solution, as it does not address the question of getting
the unit test to work on Windows.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Aug 13 13:12:26 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 13 Aug 2008 13:12:26 -0400
Subject: [Biopython-dev] [Bug 2550] Alphabet problems when adding sequences
In-Reply-To:
Message-ID: <200808131712.m7DHCQGn029982@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2550
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2008-08-13 13:12 EST -------
Changes checked into CVS, new behaviour:
>>> from Bio import Alphabet
>>> from Bio.Alphabet import IUPAC
>>> from Bio.Seq import Seq
>>> a = Seq("ACTG", Alphabet.generic_dna)
>>> b = Seq("AC-TG", Alphabet.Gapped(Alphabet.generic_dna, "-"))
>>> c = Seq("AC-TG", Alphabet.Gapped(IUPAC.unambiguous_dna, "-"))
>>> a
Seq('ACTG', DNAAlphabet())
>>> b
Seq('AC-TG', Gapped(DNAAlphabet(), '-'))
>>> c
Seq('AC-TG', Gapped(IUPACUnambiguousDNA(), '-'))
>>> b+c
Seq('AC-TGAC-TG', Gapped(DNAAlphabet(), '-'))
>>> a+b
Seq('ACTGAC-TG', Gapped(DNAAlphabet(), '-'))
>>> p = Seq("ACDEFG", Alphabet.generic_protein)
>>> q = Seq("ACDEFG", IUPAC.protein)
>>> r = Seq("ACDEFG*", Alphabet.HasStopCodon(IUPAC.protein, "*"))
>>> p
Seq('ACDEFG', ProteinAlphabet())
>>> q
Seq('ACDEFG', IUPACProtein())
>>> r
Seq('ACDEFG*', HasStopCodon(IUPACProtein(), '*'))
>>> p+q
Seq('ACDEFGACDEFG', ProteinAlphabet())
>>> p+r
Seq('ACDEFGACDEFG*', HasStopCodon(ProteinAlphabet(), '*'))
>>> c = Seq("AC-TG", Alphabet.Gapped(IUPAC.unambiguous_dna, "-"))
>>> d = Seq('AC.TG', Alphabet.Gapped(IUPAC.unambiguous_dna, '.'))
>>> c
Seq('AC-TG', Gapped(IUPACUnambiguousDNA(), '-'))
>>> d
Seq('AC.TG', Gapped(IUPACUnambiguousDNA(), '.'))
>>> c+d
Traceback (most recent call last):
File "", line 1, in
File "Bio/Seq.py", line 137, in __add__
a = Alphabet._consensus_alphabet([self.alphabet, other.alphabet])
File "/Users/pjcock/repositories/biopython/Bio/Alphabet/__init__.py", line
200, in _consensus_alphabet
raise ValueError("More than one gap character present")
ValueError: More than one gap character present
Checking in Tests/test_seq.py;
/home/repository/biopython/biopython/Tests/test_seq.py,v <-- test_seq.py
new revision: 1.16; previous revision: 1.15
done
Checking in Tests/test_GACrossover.py;
/home/repository/biopython/biopython/Tests/test_GACrossover.py,v <--
test_GACrossover.py
new revision: 1.3; previous revision: 1.2
done
Checking in Tests/test_GAMutation.py;
/home/repository/biopython/biopython/Tests/test_GAMutation.py,v <--
test_GAMutation.py
new revision: 1.2; previous revision: 1.1
done
Checking in Tests/test_GASelection.py;
/home/repository/biopython/biopython/Tests/test_GASelection.py,v <--
test_GASelection.py
new revision: 1.2; previous revision: 1.1
done
Checking in Tests/output/test_seq;
/home/repository/biopython/biopython/Tests/output/test_seq,v <-- test_seq
new revision: 1.14; previous revision: 1.13
done
Checking in Bio/Seq.py;
/home/repository/biopython/biopython/Bio/Seq.py,v <-- Seq.py
new revision: 1.30; previous revision: 1.29
done
Checking in Bio/Alphabet/__init__.py;
/home/repository/biopython/biopython/Bio/Alphabet/__init__.py,v <--
__init__.py
new revision: 1.10; previous revision: 1.9
done
Note that the GA unit tests needed a minor change in order that their bespoke
alphabet objects inherited from a Bio.Alphabet object.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython at maubp.freeserve.co.uk Fri Aug 15 12:28:21 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 15 Aug 2008 17:28:21 +0100
Subject: [Biopython-dev] Online access,
Bio.PubMed & Bio.GenBank vs Bio.Entrez
Message-ID: <320fb6e00808150928w1feb55d0j25e42c17d7230091@mail.gmail.com>
Hello,
This is a slightly long email covering what to do with the online code
in Bio.PubMed and Bio.GenBank, and how to make Bio.Entrez easier to
use. All these modules are essentially wrapping access to the NCBI
Entrez database via the Entrez Programming Utilities (EUtils).
http://www.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html
The Bio.PubMed module is now essentially a wrapper for Bio.Entrez,
offering a simple and useful subset of functionality.
e.g.
>>> from Bio import PubMed
>>> pubmed_id_list = PubMed.search_for("orchids")
>>> print pubmed_id_list
['18701671', '18687799', '18627489', '18627452', '18586527', ...., '17751120']
>>> print len(pubmed_id_list)
226
(I've included a Bio.Entrez version at the end of this email)
While this works fine, there is (currently) no way to provide your
email address to the NCBI as they encourage (in case they need to
contact you). We could add this as another optional argument I
suppose.
Also, if you then want to download some or all of these records (say
as MedLine format files to parse with Bio.Medline), doing this with
Bio.PubMed.download_many() or the Dictionary class does not take
advantage of the NCBI's history system (as they encourage). There are
similar concerns with the Bio.GenBank.search_for(), download_many()
and NCBIDictionary classes.
There is simply no way with the current decoupling of the search_for()
and downloading functions to employ the EUtils session history, so
while they are nice and fairly easy to program with they do actively
discourage users from following the NCBI's preferred usage for large
downloads.
You can do a linked search/retrieve using Bio.Entrez as documented in
our tutorial for an esearch/efetch example using nucleotide sequences.
This is currently done as the last example in the chapter, so I'm
considering making this topic a little more high profile (and moving
it before the examples).
In addition to encouraging the use of Bio.Entrez by documenting it
prominently in the tutorial, we could go further and deprecate the
"user friendly" Bio.PubMed and Bio.GenBank wrapper functions. What do
people think of this? Deprecating the Dictionary classes in
particular could be a good idea as they use the old fashioned parser
objects.
I also think it would help to make Bio.Entrez a little easier to use.
One suggestion I made back in June was to include alternative versions
of the EUtils functions which also parse the XML using
Bio.Entrez.read():
http://portal.open-bio.org/pipermail/biopython-dev/2008-June/003859.html
I rather liked Andrew Dalke's naming idea, that
Entrez.read(Entrez.esearch(...)) becomes Entrez.search(...) etc:
http://portal.open-bio.org/pipermail/biopython-dev/2008-June/003861.html
Returning to my earlier example, right now you can write:
>>> from Bio import Entrez
>>> entrez_id_list = Entrez.read(Entrez.esearch(db="pubmed", term="orchids", retmax="300", \
email="A.N.Other at Example.com"))["IdList"]
>>> len(entrez_id_list)
226
I think it would be a usability improvement to do:
>>> from Bio import Entrez
>>> entrez_id_list = Entrez.search(db="pubmed", term="orchids", retmax="300", \
email="A.N.Other at Example.com")["IdList"]
This is still more complicated than the Bio.PubMed example above, but
not by as much.
In psuedo code, the implementation would be something like this:
def search(...) :
"""Calls esearch requesting XML output and parses it."""
return parse(esearch(..., retmode="XML"))
Alternatively, Michiel had suggested having the Bio.Entrez.e*
functions automatically parse the output depending on their arguments,
but I'm not keen on this.
Peter
From bugzilla-daemon at portal.open-bio.org Sun Aug 17 09:05:57 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 17 Aug 2008 09:05:57 -0400
Subject: [Biopython-dev] [Bug 2448] Bio.EUtils can't handle accented author
names
In-Reply-To:
Message-ID: <200808171305.m7HD5vdf019408@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2448
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |WONTFIX
------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2008-08-17 09:05 EST -------
We are deprecating Bio.EUtils in favour of Bio.Entrez (for an example of how to
use this, see comment 2 in this bug).
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sun Aug 17 09:05:59 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 17 Aug 2008 09:05:59 -0400
Subject: [Biopython-dev] [Bug 2447] EUtils cannot parse PubMed XML for ACS
journals
In-Reply-To:
Message-ID: <200808171305.m7HD5xQk019423@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2447
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |WONTFIX
------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2008-08-17 09:05 EST -------
We are deprecating Bio.EUtils in favour of Bio.Entrez (for an example of how to
use this, see comment 3 in this bug).
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython at maubp.freeserve.co.uk Sun Aug 17 10:19:50 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sun, 17 Aug 2008 15:19:50 +0100
Subject: [Biopython-dev] Online access,
Bio.PubMed & Bio.GenBank vs Bio.Entrez
In-Reply-To: <320fb6e00808150928w1feb55d0j25e42c17d7230091@mail.gmail.com>
References: <320fb6e00808150928w1feb55d0j25e42c17d7230091@mail.gmail.com>
Message-ID: <320fb6e00808170719s1edeb491g883e9b3db4437e67@mail.gmail.com>
> Also, if you then want to download some or all of these records (say
> as MedLine format files to parse with Bio.Medline), doing this with
> Bio.PubMed.download_many() or the Dictionary class does not take
> advantage of the NCBI's history system (as they encourage). There are
> similar concerns with the Bio.GenBank.search_for(), download_many()
> and NCBIDictionary classes.
I have just converted Bio.GenBank.search_for() from using Bio.EUtils
to Bio.Entrez, and then afterwards realised I could have copied a lot
of this code from Bio.PubMed.search_for(). However, it was
interesting to see how my code differed.
A few things occured to me after doing this. Firstly, both these
search_for() functions take a few optional parameters which default to
None, and have to take explicit steps not to pass these None arguments
to Bio.Entrez.esearch() because currently they would wrongly get used
in the URL. It might make sense to modify Bio.Entrez._open() to skip
None arguments when building the URL.
Secondly, in my testing of the date restriction arguments (reldate,
mindate and maxdate) the URL was constructed correctly, but the
searches returned no hits. Indeed, there is a comment in the
Bio.PubMed source code (revision 1.4, Jeff Chang in 2003):
XXX The date parameters don't seem to be working with NCBI's
script. Please let me know if you can get it to work.
It looks like I'm not the only one to to find this (I was using the
nucleotide database instead of pubmed). If someone can confirm this
(e.g. URL testing in a browser) we can ask the NCBI about it.
Thirdly, assuming we don't deprecate it, perhaps
Bio.PubMed.search_for() should just use Bio.Entrez.read() to parse the
XML rather than its own mini-parser?
Finally, perhaps Bio.Entrez neads its own version search_for() which
would parse the XML results into a list of IDs, and download them in
batches. However, this might be best done as in combination with some
history helper functions to make a combined esearch and efetch easier,
which is a bigger job.
Peter
From biopython at maubp.freeserve.co.uk Mon Aug 18 07:35:01 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 18 Aug 2008 12:35:01 +0100
Subject: [Biopython-dev] Online access,
Bio.PubMed & Bio.GenBank vs Bio.Entrez
In-Reply-To: <320fb6e00808150928w1feb55d0j25e42c17d7230091@mail.gmail.com>
References: <320fb6e00808150928w1feb55d0j25e42c17d7230091@mail.gmail.com>
Message-ID: <320fb6e00808180435s25a5d40cl54e19a3b3ef2cf06@mail.gmail.com>
> You can do a linked search/retrieve using Bio.Entrez as documented in
> our tutorial for an esearch/efetch example using nucleotide sequences.
> This is currently done as the last example in the chapter, so I'm
> considering making this topic a little more high profile (and moving
> it before the examples).
I've make this example into a full section, and linked to it more
prominently in the earlier sections.
> In addition to encouraging the use of Bio.Entrez by documenting it
> prominently in the tutorial,
The tutorial in CVS has been updated to cover using PubMed with
Bio.Entrez, and now doesn't mention the search or download functions
in Bio.PubMed or Bio.GenBank.
> ... we could go further and deprecate the "user friendly" Bio.PubMed
> and Bio.GenBank wrapper functions. What do people think of this?
> Deprecating the Dictionary classes in particular could be a good idea
> as they use the old fashioned parser objects.
Bio.Entrez in CVS will now issue a warning if an email address has not
been supplied, in order to encourage user compliance with the NCBI
guidelines. Does anyone think this is too harsh? There is a small
risk this will encourage people to use a dummy email address just to
silence the warning...
Also, as a side effect of this change, the wrapper functions in
Bio.PubMed and Bio.GenBank will now trigger this new warning (unless
an email address has been setup in Entrez), e.g.
>>> from Bio import Entrez, PubMed
>>> Entrez.email = "A.N.Other at example.com"
>>> from Bio import PubMed
>>> pubmed_id_list = PubMed.search_for("orchids")
>>> print pubmed_id_list
['18701671', '18687799', '18627489', '18627452', '18586527', ...., '17751120']
>>> print len(pubmed_id_list)
226
Peter
From mjldehoon at yahoo.com Mon Aug 18 19:37:29 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Mon, 18 Aug 2008 16:37:29 -0700 (PDT)
Subject: [Biopython-dev] Online access,
Bio.PubMed & Bio.GenBank vs Bio.Entrez
In-Reply-To: <320fb6e00808180435s25a5d40cl54e19a3b3ef2cf06@mail.gmail.com>
Message-ID: <620529.69100.qm@web62402.mail.re1.yahoo.com>
> Bio.Entrez in CVS will now issue a warning if an email
> address has not
> been supplied, in order to encourage user compliance with
> the NCBI
> guidelines. Does anyone think this is too harsh? There is
> a small
> risk this will encourage people to use a dummy email
> address just to
> silence the warning...
If it's too harsh, we can remove the warning message.
I do think though that having one Bio.Entrez.email is better than having to specify the email address on each call to Entrez.
--Michiel
From mjldehoon at yahoo.com Mon Aug 18 19:46:29 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Mon, 18 Aug 2008 16:46:29 -0700 (PDT)
Subject: [Biopython-dev] Online access,
Bio.PubMed & Bio.GenBank vs Bio.Entrez
In-Reply-To: <320fb6e00808170719s1edeb491g883e9b3db4437e67@mail.gmail.com>
Message-ID: <530054.11100.qm@web62401.mail.re1.yahoo.com>
--- On Sun, 8/17/08, Peter wrote:
> Thirdly, assuming we don't deprecate it, perhaps
> Bio.PubMed.search_for() should just use Bio.Entrez.read()
> to parse the
> XML rather than its own mini-parser?
Now that Bio.Entrez is available, the mini-parser in Bio.PubMed is no longer needed.
> Finally, perhaps Bio.Entrez neads its own version
> search_for() which
> would parse the XML results into a list of IDs, and
> download them in
> batches. However, this might be best done as in
> combination with some
> history helper functions to make a combined esearch and
> efetch easier,
> which is a bigger job.
It is not entirely clear to me if a search_for function (in Bio.PubMed, Bio.GenBank, or Bio.Entrez) is a good idea. The search_for function provides a higher-level interface to the low-level functionality in Entrez. But there is a reason that Entrez only provides low-level functions: it cannot provide higher-level functions without knowing what the user wants. We as biopython don't know much more han Entrez (except that they'll want to parse the result using Python).
Maybe I'm being too pessimistic, but I think the result will be either an over-engineered function that tries to cater to all possible user wishes, or a more straightforward function that is useful only for a minority of users.
--Michiel.
From biopython at maubp.freeserve.co.uk Tue Aug 19 05:07:47 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 19 Aug 2008 10:07:47 +0100
Subject: [Biopython-dev] Online access,
Bio.PubMed & Bio.GenBank vs Bio.Entre
In-Reply-To: <530054.11100.qm@web62401.mail.re1.yahoo.com>
References: <320fb6e00808170719s1edeb491g883e9b3db4437e67@mail.gmail.com>
<530054.11100.qm@web62401.mail.re1.yahoo.com>
Message-ID: <320fb6e00808190207t4bf6bbe8v9d41ef4223b7ff48@mail.gmail.com>
> I do think though that having one Bio.Entrez.email is better
> than having to specify the email address on each call to Entrez.
I agree with you on this change, having one place to set the email
address should make using Bio.Entrez and following the NCBI guidelines
much easier.
On Tue, Aug 19, 2008 at 12:46 AM, Michiel de Hoon wrote:
>> Thirdly, assuming we don't deprecate it, perhaps
>> Bio.PubMed.search_for() should just use Bio.Entrez.read()
>> to parse the XML rather than its own mini-parser?
>
> Now that Bio.Entrez is available, the mini-parser in Bio.PubMed is no longer needed.
>
OK. Its not urgent but worth doing.
>> Finally, perhaps Bio.Entrez neads its own version
>> search_for() which would parse the XML results into a
>> list of IDs, and download them in batches. However,
>> this might be best done as in combination with some
>> history helper functions to make a combined esearch
>> and efetch easier, which is a bigger job.
>
> It is not entirely clear to me if a search_for function
> (in Bio.PubMed, Bio.GenBank, or Bio.Entrez) is a good
> idea. The search_for function provides a higher-level
> interface to the low-level functionality in Entrez. But
> there is a reason that Entrez only provides low-level
> functions: it cannot provide higher-level functions
> without knowing what the user wants. We as biopython
> don't know much more han Entrez (except that they'll
> want to parse the result using Python).
You are right if we are talking about all possible uses of Entrez.
> Maybe I'm being too pessimistic, but I think the result
> will be either an over-engineered function that tries to
> cater to all possible user wishes, or a more
> straightforward function that is useful only for a minority
> of users.
I was thinking that the "search for some sequences and then download
them" task might be a common enough and straightforward enough task to
warrent a simple helper function. However, as I haven't yet made any
serious use of the Entrez module in real code, I may not be the best
person to judge this (I prefer to download multiple genomes
automatically by FTP). We can opt to wait and see what user feedback
we get from Bio.Entrez users I guess.
Peter
From biopython at maubp.freeserve.co.uk Tue Aug 19 06:52:28 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 19 Aug 2008 11:52:28 +0100
Subject: [Biopython-dev] Unit tests for deprecated modules?
Message-ID: <320fb6e00808190352sd6437e0qb2898e39b15287b3@mail.gmail.com>
Are there any strong views about when to remove unit tests for
deprecated modules? I can see two main approaches:
(a) Remove the unit test when the code is deprecated, as this avoids
warning messages from the test suite.
(b) Remove the unit test only when the deprecated code is actually
removed, as continuing to test the code will catch any unexpected
breakage of the deprecated code.
I lean towards (b), but wondered what other people think.
Peter
From mjldehoon at yahoo.com Tue Aug 19 09:26:12 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Tue, 19 Aug 2008 06:26:12 -0700 (PDT)
Subject: [Biopython-dev] Unit tests for deprecated modules?
In-Reply-To: <320fb6e00808190352sd6437e0qb2898e39b15287b3@mail.gmail.com>
Message-ID: <655135.23748.qm@web62401.mail.re1.yahoo.com>
I would say (a). In my opinion, deprecated means that the module is in essence no longer part of Biopython; we just keep it around to give people time to change. Also, deprecation warnings distract from real warnings and errors in the unit tests, are likely to confuse users, and give the impression that Biopython is not clean. I don't remember a case where we had to resurrect a deprecated module, so we may as well remove the unit test right away.
--Michiel
--- On Tue, 8/19/08, Peter wrote:
> From: Peter
> Subject: [Biopython-dev] Unit tests for deprecated modules?
> To: "BioPython-Dev Mailing List"
> Date: Tuesday, August 19, 2008, 6:52 AM
> Are there any strong views about when to remove unit tests
> for
> deprecated modules? I can see two main approaches:
>
> (a) Remove the unit test when the code is deprecated, as
> this avoids
> warning messages from the test suite.
> (b) Remove the unit test only when the deprecated code is
> actually
> removed, as continuing to test the code will catch any
> unexpected
> breakage of the deprecated code.
>
> I lean towards (b), but wondered what other people think.
>
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
From fkauff at biologie.uni-kl.de Tue Aug 19 09:44:03 2008
From: fkauff at biologie.uni-kl.de (Frank Kauff)
Date: Tue, 19 Aug 2008 15:44:03 +0200
Subject: [Biopython-dev] Unit tests for deprecated modules?
In-Reply-To: <320fb6e00808190352sd6437e0qb2898e39b15287b3@mail.gmail.com>
References: <320fb6e00808190352sd6437e0qb2898e39b15287b3@mail.gmail.com>
Message-ID: <48AACE23.3050107@biologie.uni-kl.de>
I favor option a. Deprecated modules are no longer under development, so
there's not much need for a unit test. A failed test would probably not
trigger any action anyway, because nobody's going to do much bugfixing
in deprecated modules.
Frank
Peter wrote:
> Are there any strong views about when to remove unit tests for
> deprecated modules? I can see two main approaches:
>
> (a) Remove the unit test when the code is deprecated, as this avoids
> warning messages from the test suite.
> (b) Remove the unit test only when the deprecated code is actually
> removed, as continuing to test the code will catch any unexpected
> breakage of the deprecated code.
>
> I lean towards (b), but wondered what other people think.
>
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
>
From biopython at maubp.freeserve.co.uk Tue Aug 19 10:04:47 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 19 Aug 2008 15:04:47 +0100
Subject: [Biopython-dev] Unit tests for deprecated modules?
In-Reply-To: <48AACE23.3050107@biologie.uni-kl.de>
References: <320fb6e00808190352sd6437e0qb2898e39b15287b3@mail.gmail.com>
<48AACE23.3050107@biologie.uni-kl.de>
Message-ID: <320fb6e00808190704p4d19eb27if2927466a27f9b2a@mail.gmail.com>
On Tue, Aug 19, 2008 at 2:44 PM, Frank Kauff wrote:
> I favor option a. Deprecated modules are no longer under development, so
> there's not much need for a unit test. A failed test would probably not
> trigger any action anyway, because nobody's going to do much bugfixing in
> deprecated modules.
>
> Frank
That sounds like a mini-consensus; I'll remove the tests for the
deprecated modules shortly.
Pter
From mjldehoon at yahoo.com Thu Aug 21 07:10:19 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Thu, 21 Aug 2008 04:10:19 -0700 (PDT)
Subject: [Biopython-dev] Bio.MetaTool
Message-ID: <321861.5438.qm@web62404.mail.re1.yahoo.com>
Hi everybody,
Bio.MetaTool is currently the only Biopython module that is not deprecated and still uses Martel. The Bio.MetaTool tests suggest that this module was written for MetaTool version 3.5 (28.03.2001), while the most current MetaTool version is at 5.0. It looks like the MetaTool output has changed between 3.5 and 5.0. Is anybody interested in this module? If not, I'll ask on the user list if anybody is actually using this module. MetaTool is written for Matlab/Octave, so I'd expect that few people are using it with Python.
--Michiel.
From biopython at maubp.freeserve.co.uk Thu Aug 21 07:38:19 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 21 Aug 2008 12:38:19 +0100
Subject: [Biopython-dev] Bio.MetaTool
In-Reply-To: <321861.5438.qm@web62404.mail.re1.yahoo.com>
References: <321861.5438.qm@web62404.mail.re1.yahoo.com>
Message-ID: <320fb6e00808210438n3b6c4b8bvadd9e9ab6be8712d@mail.gmail.com>
> Hi everybody,
>
> Bio.MetaTool is currently the only Biopython module that is not deprecated and still uses Martel.
> The Bio.MetaTool tests suggest that this module was written for MetaTool version 3.5
> (28.03.2001), while the most current MetaTool version is at 5.0. It looks like the MetaTool
> output has changed between 3.5 and 5.0.
I'd started looking into this, but hadn't established if the file
format itself had changed or not.
This would probably mean a lot more work would be required to bring it
back up to date
(assuming anyone is interested in using this with python).
Peter
From bugzilla-daemon at portal.open-bio.org Sun Aug 24 07:56:33 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 24 Aug 2008 07:56:33 -0400
Subject: [Biopython-dev] [Bug 2251] [PATCH] NumPy support for BioPython
In-Reply-To:
Message-ID: <200808241156.m7OBuXlV023353@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2251
------- Comment #14 from biopython-bugzilla at maubp.freeserve.co.uk 2008-08-24 07:56 EST -------
On a topic related to moving from Numeric to NumPy, as of
the end of the SciPy 2008 conference, Travis Oliphant's
numpy book is now freely available. Quoting the numpy
mailing list Travis wrote:
> By the way, as promised, the NumPy book is now available
> for download and the source to the book is checked in to
> the numpy SVN tree:
http://svn.scipy.org/svn/numpy/trunk/doc/numpybook/
>From http://scipy.org/Documentation
> Guide to NumPy by Travis Oliphant the lead developer of
> NumPy. This e-book is a complete reference to NumPy, this
> is a nice documentation to all features of NumPy. It was
> fee-based but as of Aug 21, 2008 it is in the public domain.
PDF file at http://www.tramy.us/numpybook.pdf
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sun Aug 24 21:08:40 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 24 Aug 2008 21:08:40 -0400
Subject: [Biopython-dev] [Bug 2570] record2title bug in SeqIO/FastaIO
In-Reply-To:
Message-ID: <200808250108.m7P18eE2008077@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2570
chris.lasher at gmail.com changed:
What |Removed |Added
----------------------------------------------------------------------------
AssignedTo|chris.lasher at gmail.com |biopython-dev at biopython.org
------- Comment #1 from chris.lasher at gmail.com 2008-08-24 21:08 EST -------
Anyone want to double check this before I mark it as FIXED?
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sun Aug 24 22:08:13 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 24 Aug 2008 22:08:13 -0400
Subject: [Biopython-dev] [Bug 2570] record2title bug in SeqIO/FastaIO
In-Reply-To:
Message-ID: <200808250208.m7P28Dos009840@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2570
------- Comment #2 from sbassi at gmail.com 2008-08-24 22:08 EST -------
(In reply to comment #1)
> Anyone want to double check this before I mark it as FIXED?
>
Could you post steps to reproduce the bug?
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Aug 25 00:11:05 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 25 Aug 2008 00:11:05 -0400
Subject: [Biopython-dev] [Bug 2570] record2title bug in SeqIO/FastaIO
In-Reply-To:
Message-ID: <200808250411.m7P4B5O6013506@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2570
------- Comment #3 from chris.lasher at gmail.com 2008-08-25 00:11 EST -------
Created an attachment (id=988)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=988&action=view)
Script demonstrating NameError bug with record2title
See the attached brief script with CVS version 1.11 of FastaIO.py versus 1.12.
This is the traceback I get with 1.11:
Traceback (most recent call last):
File "fastaiorec2titlebroken.py", line 30, in
writer.write_record(record)
File "/usr/local/lib/python2.5/site-packages/Bio/SeqIO/FastaIO.py", line 116,
in write_record
title=self.clean(record2title(record))
NameError: global name 'record2title' is not defined
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Aug 25 05:35:33 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 25 Aug 2008 05:35:33 -0400
Subject: [Biopython-dev] [Bug 2570] record2title bug in SeqIO/FastaIO
In-Reply-To:
Message-ID: <200808250935.m7P9ZXgC024796@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2570
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2008-08-25 05:35 EST -------
I spotted your change on the CVS RSS feed, and agree it looks like the correct
fix. Well spotted Chris, thanks.
I'm curious if you actually trying to use Bio.SeqIO.FastaIO directly with
record2title, or if the script is purely to demonstrate the bug? This is going
off the topic of this bug, but passing optional arguments to the underlying
parser/writer via the Bio.SeqIO functions is a possible future extension we may
want to discuss on the mailing list (see also Bug 2443).
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Aug 26 14:32:28 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 26 Aug 2008 14:32:28 -0400
Subject: [Biopython-dev] [Bug 2571] New: Mailing lists and other email
problems at open-bio.org
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2571
Summary: Mailing lists and other email problems at open-bio.org
Product: Biopython
Version: Not Applicable
Platform: All
OS/Version: All
Status: NEW
Severity: major
Priority: P2
Component: Other
AssignedTo: biopython-dev at biopython.org
ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk
Since 21 August 2008 emails sent to the OBF mailing lists have failed.
Depending on your ISP, you may have received "delayed but retrying" messages
for a couple of days, and then a failure message if you tried to post to the
list.
Interestingly emails sent by bugzilla to the mailing lists work (e.g.
Biopython-dev and bioperl-guts-l), perhaps because this is on the same machine
as the mail server. This also demonstrates that the archive is working and is
being updated.
e.g.
http://lists.open-bio.org/pipermail/biopython-dev/2008-August/date.html
[only Bugzilla after 21 Aug]
http://bioperl.org/pipermail/bioperl-guts-l/2008-August/date.html
[only Bugzilla after 21 Aug]
http://lists.open-bio.org/pipermail/biopython/2008-August/date.html
[nothing after 21 Aug]
http://bioperl.org/pipermail/bioperl-l/2008-August/date.html [nothing
after 21 Aug]
http://lists.open-bio.org/pipermail/biosql-l/2008-August/date.html
[nothing after 21 Aug]
As a way of trying to tell everyone, I'm filing this bug and have left it
assigned to the default of the Biopython-dev mailing list.
I have tried to email to support at helpdesk.open-bio.org to report the issue,
but this email failed in the same way, so I've since forwarded my email to OBF
people directly...
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Aug 26 15:59:20 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 26 Aug 2008 15:59:20 -0400
Subject: [Biopython-dev] [Bug 2571] Mailing lists and other email problems
at open-bio.org
In-Reply-To:
Message-ID: <200808261959.m7QJxKtx030123@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2571
------- Comment #1 from sbassi at gmail.com 2008-08-26 15:59 EST -------
(In reply to comment #0)
> Since 21 August 2008 emails sent to the OBF mailing lists have failed.
> Depending on your ISP, you may have received "delayed but retrying" messages
> for a couple of days, and then a failure message if you tried to post to the
> list.
I have the same problem (from gmail.com)
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Aug 26 19:58:39 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 26 Aug 2008 19:58:39 -0400
Subject: [Biopython-dev] [Bug 2571] Mailing lists and other email problems
at open-bio.org
In-Reply-To:
Message-ID: <200808262358.m7QNwdXs007459@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2571
------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-08-26 19:58 EST -------
I had an update from Chris Dagdigian - things should be fixed now :)
######
OBF Notice regarding delayed & bouncing email
----------------------------------------------
For several days the OBF mail & mailing list server may have been unreachable
via the internet, resulting in queued undelivered emails and email "temporarily
undeliverable" bounce errors going back to people sending us email.
The issue was detected promptly on our end but since no OBF configuration
changes had been made, critical time passed while dealing with our upstream ISP
and network operations center on the assumption that an external firewall or
router configuration update was the culprit.
It was only after ruling out external causes that we started looking at
internal possibilities within our own infrastructure. We then spent more time
troubleshooting and ruling out obvious causes. With the obvious causes ruled
out we started trying truly odd things.
To our surprise, the physical reset of a security and intrusion prevention
device that has no role in email access, relaying or delivery actually fixed
the problem. The device sits between our colocation cage and the internet and
by all measures had appeared to be performing its functions without issue. We
still have no idea how it managed to only block SMTP traffic while working fine
in all other respects.
Apologies for the inconvenience; queued email that OBF community members had
tried to send us should still be queued up and waiting delivery at upstream
ISPs. As those systems retry and are (finally!) able to relay mail on to us
your delayed list messages and emails should start appearing.
- Chris Dagdigiain, OBF mailteam
######
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython at maubp.freeserve.co.uk Sun Aug 24 09:00:57 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sun, 24 Aug 2008 14:00:57 +0100
Subject: [Biopython-dev] Fwd: Online access,
Bio.PubMed & Bio.GenBank vs Bio.Entrez
In-Reply-To: <320fb6e00808220935j682fe784l7a7f38dce3fec916@mail.gmail.com>
References: <320fb6e00808150928w1feb55d0j25e42c17d7230091@mail.gmail.com>
<320fb6e00808170719s1edeb491g883e9b3db4437e67@mail.gmail.com>
<320fb6e00808220935j682fe784l7a7f38dce3fec916@mail.gmail.com>
Message-ID: <320fb6e00808240600s540c90e8if392c715fa6cbcd8@mail.gmail.com>
I'm trying sending this again as there seems to have been a glitch on
the mailing list...
---------- Forwarded message ----------
From: Peter
Date: Fri, Aug 22, 2008 at 5:35 PM
Subject: Re: Online access, Bio.PubMed & Bio.GenBank vs Bio.Entrez
To: BioPython-Dev Mailing List
On the topic of ESearch and date restrictions, I wrote:
> ... in my testing of the date restriction arguments (reldate,
> mindate and maxdate) the URL was constructed correctly, but the
> searches returned no hits. Indeed, there is a comment in the
> Bio.PubMed source code (revision 1.4, Jeff Chang in 2003):
>
> XXX The date parameters don't seem to be working with NCBI's
> script. Please let me know if you can get it to work.
>
> It looks like I'm not the only one to to find this (I was using the
> nucleotide database instead of pubmed).
There is some interesting information on the BioPerl wiki related to this:
http://www.bioperl.org/wiki/HOWTO:EUtilities_Cookbook
>> Set the -reldate parameter to the number of days prior to today's
>> date. As a note, for some reason NCBI has dropped the EDAT
>> field for the nucleotide and protein databases (which is the default
>> date type when using -reldate); in this case use the -datetype flag
>> to MDAT (modification date) or PDAT (publication date, or the date
>> added to the database).
Hopefully I'll remember to try this with Bio.Entrez and, if it works,
update our documentation accordingly.
Peter
From bugzilla-daemon at portal.open-bio.org Wed Aug 27 05:19:34 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 27 Aug 2008 05:19:34 -0400
Subject: [Biopython-dev] [Bug 2571] Mailing lists and other email problems
at open-bio.org
In-Reply-To:
Message-ID: <200808270919.m7R9JYJd003225@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2571
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-08-27 05:19 EST -------
The OBF team have indeed fixed things. Thanks guys!
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Aug 27 16:56:24 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 27 Aug 2008 16:56:24 -0400
Subject: [Biopython-dev] [Bug 2574] New: PDB download ftp should be newer
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2574
Summary: PDB download ftp should be newer
Product: Biopython
Version: 1.47
Platform: PC
URL: http://www.rcsb.org/pdb/static.do?p=download/ftp/index.h
tml
OS/Version: Linux
Status: NEW
Severity: normal
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: xubeisi at gmail.com
QAContact: xubeisi at gmail.com
just a little change should be make to Bio/PDB/PDBList.py
FTP Access
The PDB archive that has been remediated by the wwPDB is available from
ftp.wwpdb.org. Searches and reports performed on this RCSB PDB website utilize
these data.
An overview of the wwPDB remediation project and a document detailing the types
of changes made and not made is available at www.wwpdb.org/docs.html. If you
have any questions about this archive please send email to info at wwpdb.org.
Note: Users should switch to binary mode before downloading data files.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Aug 27 17:08:14 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 27 Aug 2008 17:08:14 -0400
Subject: [Biopython-dev] [Bug 2574] PDB download ftp should be newer
In-Reply-To:
Message-ID: <200808272108.m7RL8Ee5011105@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2574
xubeisi at gmail.com changed:
What |Removed |Added
----------------------------------------------------------------------------
AssignedTo|biopython-dev at biopython.org |xubeisi at gmail.com
Status|NEW |ASSIGNED
------- Comment #1 from xubeisi at gmail.com 2008-08-27 17:08 EST -------
Created an attachment (id=989)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=989&action=view)
ftp address and compress suffix
two line changed, ftp server address and compress suffix
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
You are the assignee for the bug, or are watching the assignee.
From mjldehoon at yahoo.com Wed Aug 27 20:12:14 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Wed, 27 Aug 2008 17:12:14 -0700 (PDT)
Subject: [Biopython-dev] NumPy
Message-ID: <579747.5381.qm@web62404.mail.re1.yahoo.com>
Hi everybody,
Previously we discussed on this mailing list whether Biopython should adopt the "new" Numerical Python (aka NumPy, currently at version 1.1.1) instead of the "old" Numerical Python (version 24.2). My objections against NumPy were that its documentation is not freely available, it doesn't compile cleanly on all platforms, and some other scientific and computational biology libraries use the old Numerical Python.
Last week, the NumPy documentation did become freely available. Compilation of NumPy is still not perfect on all platforms (e.g. on Cygwin it may fail), however recently I have also noticed that compilation of the "old" Numerical Python may fail on modern systems. As far as I can tell, MMTK and PyMOL are (still?) based on the "old" Numerical Python, but Matplotlib now relies on the "new" Numerical Python.
In my opinion, the balance is now tilting in favor of the new NumPy, and we should consider transitioning Biopython to the new NumPy. Does anybody have a different opinion? If not, I suggest we bring this up on the user mailing list.
--Michiel.
From biopython at maubp.freeserve.co.uk Thu Aug 28 05:38:41 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 28 Aug 2008 10:38:41 +0100
Subject: [Biopython-dev] NumPy
In-Reply-To: <579747.5381.qm@web62404.mail.re1.yahoo.com>
References: <579747.5381.qm@web62404.mail.re1.yahoo.com>
Message-ID: <320fb6e00808280238u7a2a367dhbbe78d32f82e2c28@mail.gmail.com>
On Thu, Aug 28, 2008 at 1:12 AM, Michiel de Hoon wrote:
> Hi everybody,
>
> Previously we discussed on this mailing list whether Biopython should adopt
> the "new" Numerical Python (aka NumPy, currently at version 1.1.1)
> instead of the "old" Numerical Python (version 24.2). My objections against
> NumPy were that its documentation is not freely available, it doesn't
> compile cleanly on all platforms, and some other scientific and
> computational biology libraries use the old Numerical Python.
>
> Last week, the NumPy documentation did become freely available.
> Compilation of NumPy is still not perfect on all platforms (e.g. on Cygwin
> it may fail), however recently I have also noticed that compilation of the
> "old" Numerical Python may fail on modern systems. As far as I can tell,
> MMTK and PyMOL are (still?) based on the "old" Numerical Python, but
> Matplotlib now relies on the "new" Numerical Python.
>
> In my opinion, the balance is now tilting in favor of the new NumPy,
> and we should consider transitioning Biopython to the new NumPy.
> Does anybody have a different opinion? If not, I suggest we bring this
> up on the user mailing list.
I agree that we should be aiming to transition from Numeric to NumPy.
One question is will Numpy 1.2 be binary compatible with Numpy 1.1
(for our compiled C extensions). This isn't true for the Numpy 1.2 beta
releases thus far, but it sounds like they are going to try:
http://projects.scipy.org/pipermail/numpy-discussion/2008-August/036508.html
http://projects.scipy.org/pipermail/numpy-discussion/2008-August/036909.html
Probably in the short term we should just target NumPy 1.1
Peter
From david.w.h.chin at gmail.com Thu Aug 28 11:11:11 2008
From: david.w.h.chin at gmail.com (David Chin)
Date: Thu, 28 Aug 2008 11:11:11 -0400
Subject: [Biopython-dev] NumPy
In-Reply-To: <579747.5381.qm@web62404.mail.re1.yahoo.com>
References: <579747.5381.qm@web62404.mail.re1.yahoo.com>
Message-ID:
On Wed, Aug 27, 2008 at 20:12, Michiel de Hoon wrote:
[snip]
>
> [snip] As far as I can tell, MMTK and PyMOL are (still?) based on the "old"
> Numerical Python, but Matplotlib now relies on the "new" Numerical Python.
I've been using Matplotlib with Numpy on Linux (RedHat 3,4, Fedora
5-9) and MacOSX (10.3, 10.4) for the last 6 years or so. Matplotlib
has been using Numpy for at least 3 years, and it's been very stable.
I never found the lack of printable documentation to be a problem
because the built-in doc strings (easily accessible with ipython) and
the available tutorials were more than adequate.
As for binary compatibility, I would suggest biting the bullet and
doing the port of the C extensions to 1.2 at the same time. I've had
to port some of my numpy-1.1 C extensions to numpy-1.2, and it wasn't
bad at all. But then, all I was doing was reading into standard Python
objects rather than Numpy-specific ones. Just my 2 cents.
Cheers,
Dave Chin
--
Email: david.w.h.chin AT gmail.com dwchin AT lroc.harvard.edu
http://gallatin.physics.lsa.umich.edu/~dwchin/Work
Public key: http://david.w.h.chin.googlepages.com/crypto.html
pub 2048D/1D8A5BC2 2008-08-18 [expires: 2009-08-18]
uid David Chin
uid David Chin
uid David Chin
Key fingerprint = A8A0 6E91 6A42 F42E BF18 2202 29C0 A056 1D8A 5BC2