From bugzilla-daemon at portal.open-bio.org Tue Sep 2 09:06:51 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 2 Sep 2008 09:06:51 -0400
Subject: [Biopython-dev] [Bug 2543] Bio.Nexus.Trees can't handle named
ancestors
In-Reply-To:
Message-ID: <200809021306.m82D6p9i021009@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2543
------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-09-02 09:06 EST -------
Hi Frank,
Did you get a chance to look at that code for named ancestors?
Thanks
Peter
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Sep 2 10:05:17 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 2 Sep 2008 10:05:17 -0400
Subject: [Biopython-dev] [Bug 2543] Bio.Nexus.Trees can't handle named
ancestors
In-Reply-To:
Message-ID: <200809021405.m82E5HRM025041@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2543
------- Comment #4 from cymon.cox at gmail.com 2008-09-02 10:05 EST -------
Hi Peter,
> Can I ask if you've actually come across trees with names ancestor nodes in
> "real life"? That would make this bug more important. If so, the name of the
> tool would be interesting,
P4 (http://code.google.com/p/p4-phylogenetics/) is the only one I'm aware of.
As Frank implies the labels at nodes arent necessarily names of ancestors but
rather are just labels that can be any text. In P4 they are they are just an
string attribute of the node object. P4 uses them primarily to aid tree
drawing. Support indices in phylogenetics are properties of branches and this
is fine in a unrooted tree context. But most systematists want to orientate the
tree, ie. root it informally, and refer to a particular node having the support
value of its subtending branch. Its therefore useful to transfer the branch
support values to node labels before drawing the tree.
> an example tree file would be great to add to
> Biopython as a test case.
How about this:
+--------------3:t9
+------2:B
| | +----------------5:t8
| +-----4:C
+-----1:A +---------------6:t4
| |
| +---------------7:t6
|
|------------------8:t2
0
| +------------11:t0
| +-----10:E
| | +-----------------12:t7
| |
+-----9:D +------15:t5
| +------14:G
+-----13:F +-----------------16:t3
|
+--------17:t1
"""
#NEXUS
begin taxa;
dimensions ntax=10;
taxlabels t0 t1 t2 t3 t4 t5 t6 t7 t8 t9;
end;
begin trees;
tree random = [&U] (((t9:0.385832, (t8:0.445135,
t4:0.41401)C:0.024032)B:0.041436, t6:0.392496)A:0.0291131, t2:0.497673,
((t0:0.301171, t7:0.482152)E:0.0268148, ((t5:0.0984167,
t3:0.488578)G:0.0349662, t1:0.130208)F:0.0318288)D:0.0273876);
end;
"""
(Hi Frank)
Cheers, Cymon.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Sep 2 11:44:11 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 2 Sep 2008 11:44:11 -0400
Subject: [Biopython-dev] [Bug 2543] Bio.Nexus.Trees can't handle named
ancestors
In-Reply-To:
Message-ID: <200809021544.m82FiBej030561@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2543
------- Comment #5 from fkauff at biologie.uni-kl.de 2008-09-02 11:44 EST -------
(In reply to comment #3)
Hi Peter,
haven't done anything yet. The previously mentioned code works different
(assigning values to nodes within a [&...] comment), rather than names to
nodes.
Assigning names to nodes can be very useful, but as Cymon mention, P4 seems to
be the only program that can handle them.
In my opinion, naming nodes is a feature, and I would not regard the lack of
this feature as a bug.
But I'll have a look at the code and see how easy this can be changed. It would
actually be nice if P4 and Bio.Nexus, both being python programs, could read
each other's trees.
(Hi Cymon :-) )
Frank
> Hi Frank,
>
> Did you get a chance to look at that code for named ancestors?
>
> Thanks
>
> Peter
>
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython at maubp.freeserve.co.uk Tue Sep 2 11:49:09 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 2 Sep 2008 16:49:09 +0100
Subject: [Biopython-dev] Preparing for Biopython 1.48
Message-ID: <320fb6e00809020849i7c480997g810ffbc523143f47@mail.gmail.com>
Dear all,
Is there anything we need to address before starting to prepare Biopython 1.48?
This is likely to be the final Numeric only release of Biopython, with
the following release hopefully supporting both Numeric and numpy in
some way.
(As an aside, once we support numpy, having scipy as an optional
dependency for the statistics Tiago wanted to use in Bio.PopGen does
seem less onerous.)
Regarding potentially deprecating Bio.Mindy and Martel, I propose we
do this in the following release (i.e. after Biopython 1.48). We can
then drop the mxTextTools dependency completely.
While I would like to address some of the enhancements (esp Bug 2530)
these can wait. Ignoring the enhancements, there are several "small"
issues on bugzilla that could be dealt with, but nothing that I think
warrants delaying the release.
One question: Currently Bio.SeqIO in CVS has partial support for
writing GenBank files (basically the sequence and minimal annotation -
no references, no features). I don't want to rush something out
without proper testing, so do people think it would be better to ship
with this partial support, or temporarily disable it (a one line
change in Bio/SeqIO/__init__.py to the _FormatToWriter dictionary, and
probably refreshing the expected unit test output).
Comments and suggestions welcome!
Thanks,
Peter
From tiagoantao at gmail.com Tue Sep 2 14:25:31 2008
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Tue, 2 Sep 2008 19:25:31 +0100
Subject: [Biopython-dev] Preparing for Biopython 1.48
In-Reply-To: <320fb6e00809020849i7c480997g810ffbc523143f47@mail.gmail.com>
References: <320fb6e00809020849i7c480997g810ffbc523143f47@mail.gmail.com>
Message-ID: <6d941f120809021125u7188d67l2612fd0f09277abc@mail.gmail.com>
Hi All,
On Tue, Sep 2, 2008 at 4:49 PM, Peter wrote:
> (As an aside, once we support numpy, having scipy as an optional
> dependency for the statistics Tiago wanted to use in Bio.PopGen does
> seem less onerous.)
>
First, my apologies for not reporting back from BOSC,but I was in a
conference/professional visit spree for the last 3 months, returned last
most. Basically it was not OK: I arrived there from a previous conference
and did the presentation without little sleep, it was probably the sloppiest
presentation in my whole life. My sincere apologies.
On a better front, I have a lot of new content for Bio.PopGen, a few
remarks:
1. No documentation and testing done, so I will skip adding content to 1.48.
But I will surely add to 1.49.
2. None of the new content relies on scipy (as there was no agreement on
that), but being able to use scipy would make things much easier. Most of
anything that can be called "population genetics" is nothing more than
statistics (statistics were invented because of population genetics). So a
change in policy would be welcomed (and would make Bio.PopGen really useful
for a wide audience - currently it has only niche users).
In another front, we published a paper using content from Bio.PopGen 1.44
http://www.biomedcentral.com/1471-2105/9/323
Regards,
Tiago
From biopython at maubp.freeserve.co.uk Tue Sep 2 15:05:31 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 2 Sep 2008 20:05:31 +0100
Subject: [Biopython-dev] Preparing for Biopython 1.48
In-Reply-To:
References: <320fb6e00809020849i7c480997g810ffbc523143f47@mail.gmail.com>
<6d941f120809021125u7188d67l2612fd0f09277abc@mail.gmail.com>
Message-ID: <320fb6e00809021205v4ec1a8f3wa3997881ad1e7d07@mail.gmail.com>
Tiago wrote:
>> First, my apologies for not reporting back from BOSC,but I was in a
>> conference/professional visit spree for the last 3 months, returned last
>> most. Basically it was not OK: I arrived there from a previous conference
>> and did the presentation without little sleep, it was probably the
>> sloppiest presentation in my whole life. My sincere apologies.
I hardly dared ask how you felt at the end of your almost round the
world trip ;)
Jared wrote:
> Despite Tiago's self-criticism I thought his BOSC presentation was fine and
> up to par with the rest of them.
>
> jared
That sounds much more positive :)
This reminds me that I could/should make a PDF version of the BOSC
2008 slides to go online here:
http://biopython.org/wiki/Documentation#Presentations
>> On a better front, I have a lot of new content for Bio.PopGen, a few
>> remarks:
>> 1. No documentation and testing done, so I will skip adding content to
>> 1.48. But I will surely add to 1.49.
That sounds sensible, and another reason to get Biopython 1.48 out
soon. Depending how my day goes tomorrow, I could try then.
>> 2. None of the new content relies on scipy (as there was no agreement on
>> that), but being able to use scipy would make things much easier. Most of
>> anything that can be called "population genetics" is nothing more than
>> statistics (statistics were invented because of population genetics). So a
>> change in policy would be welcomed (and would make Bio.PopGen really
>> useful for a wide audience - currently it has only niche users).
Let's get the move from Numeric to NumPy done after Biopython 1.48,
and re-open the possible SciPy dependency question then.
>> In another front, we published a paper using content from Bio.PopGen 1.44
>> http://www.biomedcentral.com/1471-2105/9/323
Excellent,
Peter
From jflatow at northwestern.edu Tue Sep 2 14:42:52 2008
From: jflatow at northwestern.edu (Jared Flatow)
Date: Tue, 2 Sep 2008 13:42:52 -0500
Subject: [Biopython-dev] Preparing for Biopython 1.48
In-Reply-To: <6d941f120809021125u7188d67l2612fd0f09277abc@mail.gmail.com>
References: <320fb6e00809020849i7c480997g810ffbc523143f47@mail.gmail.com>
<6d941f120809021125u7188d67l2612fd0f09277abc@mail.gmail.com>
Message-ID:
Despite Tiago's self-criticism I thought his BOSC presentation was
fine and up to par with the rest of them.
jared
On Sep 2, 2008, at 1:25 PM, Tiago Ant?o wrote:
> Hi All,
>
> On Tue, Sep 2, 2008 at 4:49 PM, Peter
> wrote:
>
>> (As an aside, once we support numpy, having scipy as an optional
>> dependency for the statistics Tiago wanted to use in Bio.PopGen does
>> seem less onerous.)
>>
>
> First, my apologies for not reporting back from BOSC,but I was in a
> conference/professional visit spree for the last 3 months, returned
> last
> most. Basically it was not OK: I arrived there from a previous
> conference
> and did the presentation without little sleep, it was probably the
> sloppiest
> presentation in my whole life. My sincere apologies.
>
> On a better front, I have a lot of new content for Bio.PopGen, a few
> remarks:
> 1. No documentation and testing done, so I will skip adding content
> to 1.48.
> But I will surely add to 1.49.
> 2. None of the new content relies on scipy (as there was no
> agreement on
> that), but being able to use scipy would make things much easier.
> Most of
> anything that can be called "population genetics" is nothing more than
> statistics (statistics were invented because of population
> genetics). So a
> change in policy would be welcomed (and would make Bio.PopGen really
> useful
> for a wide audience - currently it has only niche users).
>
>
> In another front, we published a paper using content from Bio.PopGen
> 1.44
> http://www.biomedcentral.com/1471-2105/9/323
>
> Regards,
> Tiago
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
From tiagoantao at gmail.com Tue Sep 2 15:29:55 2008
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Tue, 2 Sep 2008 20:29:55 +0100
Subject: [Biopython-dev] Preparing for Biopython 1.48
In-Reply-To: <320fb6e00809021205v4ec1a8f3wa3997881ad1e7d07@mail.gmail.com>
References: <320fb6e00809020849i7c480997g810ffbc523143f47@mail.gmail.com>
<6d941f120809021125u7188d67l2612fd0f09277abc@mail.gmail.com>
<320fb6e00809021205v4ec1a8f3wa3997881ad1e7d07@mail.gmail.com>
Message-ID: <6d941f120809021229u389d1550re8bb7ec4ad3fc5b@mail.gmail.com>
On Tue, Sep 2, 2008 at 8:05 PM, Peter wrote:
> This reminds me that I could/should make a PDF version of the BOSC
> 2008 slides to go online here:
> http://biopython.org/wiki/Documentation#Presentations
>
http://www.slideshare.net/tiago/bosc-2008-biopython
Is there for a month, by I completely forgot to inform.
Tiago
From bugzilla-daemon at portal.open-bio.org Wed Sep 3 12:46:30 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 3 Sep 2008 12:46:30 -0400
Subject: [Biopython-dev] [Bug 2578] New: The GenBank SeqRecord parser does
not record module type or if circular
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2578
Summary: The GenBank SeqRecord parser does not record module type
or if circular
Product: Biopython
Version: 1.47
Platform: All
OS/Version: All
Status: NEW
Severity: minor
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk
Filing this bug after discussion on the mailing list, where the issue was
raised by Chris Lasher:
http://lists.open-bio.org/pipermail/biopython/2008-September/004474.html
http://lists.open-bio.org/pipermail/biopython/2008-September/004475.html
http://lists.open-bio.org/pipermail/biopython/2008-September/004476.html
The LOCUS line at the start of a GenBank record can record the molecule type
(DNA, RNA, mRNA, protein etc) and also if the sequence is linear or circular,
e.g.
LOCUS NC_002678 7036071 bp DNA circular BCT 22-JUL-2008
Currently Bio.SeqIO (and Bio.GenBank.FeatureParser if called directly) do not
record these two bits of information in the SeqRecord.
Bio.SeqIO uses the Bio.GenBank.FeatureParser, which gets passed this
information from the Scanner via the residue_type event. This is a combined
lump of data containing both the sequence type (DNA, RNA etc) and if it is
linear or circular. It is currently only used to determine the Seq alphabet,
and has never been recorded. So in addition to not recording if the LOCUS line
said the sequence was circular, if the LOCUS line contained cDNA, mRNA, ...
this fine detail is also currently lost in the SeqRecord representation. On
the other hand, the Bio.GenBank.RecordParser stores all this as the record's
residue_type property (a single combined field, presumably reflecting the
layout of early GenBank files).
It would be a logical improvement to record the sequence data (molecule type
and if circular) in the SeqRecord's annotations dictionary - perhaps as two
fields but we'd need to check if that would be straight forward for EMBL files
too. Alternatively, if Biopython included a native CircularSeq object, we
could use that explicitly when the sequence is declared as circular. This
might be considered a little surprising though.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Sep 3 12:54:35 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 3 Sep 2008 12:54:35 -0400
Subject: [Biopython-dev] [Bug 2578] The GenBank SeqRecord parser does not
record module type or if circular
In-Reply-To:
Message-ID: <200809031654.m83GsZ5G017770@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2578
------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-09-03 12:54 EST -------
Note that after any change made to record this information, the preliminary
GenBank writing support for Bio.SeqIO should also be updated - see Bug 2294.
It would also be sensible to see how BioPerl, BioJava etc store this
information within BioSQL so that if possible we can do it the same way. I'm
assuming this is just a case of picking the same text (term table key) for our
annotations dictionary key.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Sep 3 12:55:38 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 3 Sep 2008 12:55:38 -0400
Subject: [Biopython-dev] [Bug 2578] The GenBank SeqRecord parser does not
record molecule type or if circular
In-Reply-To:
Message-ID: <200809031655.m83GtcPl017915@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2578
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|The GenBank SeqRecord parser|The GenBank SeqRecord parser
|does not record module type |does not record molecule
|or if circular |type or if circular
------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-09-03 12:55 EST -------
Fixed the typo in "molecule" for the bug title. Whoops.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython at maubp.freeserve.co.uk Thu Sep 4 14:36:44 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 4 Sep 2008 19:36:44 +0100
Subject: [Biopython-dev] Preparing for Biopython 1.48
In-Reply-To: <6d941f120809021229u389d1550re8bb7ec4ad3fc5b@mail.gmail.com>
References: <320fb6e00809020849i7c480997g810ffbc523143f47@mail.gmail.com>
<6d941f120809021125u7188d67l2612fd0f09277abc@mail.gmail.com>
<320fb6e00809021205v4ec1a8f3wa3997881ad1e7d07@mail.gmail.com>
<6d941f120809021229u389d1550re8bb7ec4ad3fc5b@mail.gmail.com>
Message-ID: <320fb6e00809041136q4c01360ane4b0efcc58a68b8a@mail.gmail.com>
On Tue, Sep 2, 2008 at 8:29 PM, Tiago Ant?o wrote:
> On Tue, Sep 2, 2008 at 8:05 PM, Peter wrote:
>
>> This reminds me that I could/should make a PDF version of the BOSC
>> 2008 slides to go online here:
>> http://biopython.org/wiki/Documentation#Presentations
>>
>
> http://www.slideshare.net/tiago/bosc-2008-biopython
> Is there for a month, by I completely forgot to inform.
I spotted the slide share thing (it surprised me last year) and added
that to the wiki page already.
On to practical matters, I've just done a clean check out on Linux and
run the test suite, everything passes except these ones with external
dependencies:
test_GFF ... skipping. Environment is not configured for this test
(not important if you do not plan to use Bio.GFF).
test_PopGen_FDist ... skipping. Fdist not found (not a problem if you
do not intend to use it).
test_PopGen_SimCoal ... skipping. SimCoal not found (not a problem if
you do not intend to use it).
test_Wise ... skipping. sh: dnal: command not found
test_psw ... skipping. sh: dnal: command not found
I've never installed any of the PopGen tools - I presume these tests
are still OK on your machine(s) Tiago?
I've also never installed dnal, or setup whatever test_GFF wants.
Could anyone confirm these are OK?
I have run the BioSQL tests though.
Peter
From tiagoantao at gmail.com Fri Sep 5 06:05:26 2008
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Fri, 5 Sep 2008 11:05:26 +0100
Subject: [Biopython-dev] Preparing for Biopython 1.48
In-Reply-To: <320fb6e00809041136q4c01360ane4b0efcc58a68b8a@mail.gmail.com>
References: <320fb6e00809020849i7c480997g810ffbc523143f47@mail.gmail.com>
<6d941f120809021125u7188d67l2612fd0f09277abc@mail.gmail.com>
<320fb6e00809021205v4ec1a8f3wa3997881ad1e7d07@mail.gmail.com>
<6d941f120809021229u389d1550re8bb7ec4ad3fc5b@mail.gmail.com>
<320fb6e00809041136q4c01360ane4b0efcc58a68b8a@mail.gmail.com>
Message-ID: <6d941f120809050305v7f991fe0jcfdfc650936e7348@mail.gmail.com>
On Thu, Sep 4, 2008 at 7:36 PM, Peter wrote:
> I've never installed any of the PopGen tools - I presume these tests
> are still OK on your machine(s) Tiago?
>
>
All tests are OK here (Linux x86).
From biopython at maubp.freeserve.co.uk Fri Sep 5 06:43:27 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 5 Sep 2008 11:43:27 +0100
Subject: [Biopython-dev] CVS freeze for Biopython 1.48
Message-ID: <320fb6e00809050343y598c436fi9aa65ec272f1492d@mail.gmail.com>
Dear all,
I'm going to try and put together Biopython 1.48 this afternoon, so
could you all not commit any changes until further notice please.
I'll be doing the source code releases (and possibly a Windows
installer for Python 2.3 tonight if my old laptop still has all the MS
compilers working), but there will then be a slight delay while we get
the (other) Windows installers done.
Thank you,
Peter
From biopython at maubp.freeserve.co.uk Fri Sep 5 19:19:13 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sat, 6 Sep 2008 00:19:13 +0100
Subject: [Biopython-dev] Nexus issue on Windows with Python 2.3
Message-ID: <320fb6e00809051619v762ee276ld0d44a300403bfeb@mail.gmail.com>
Can anyone using Windows and/or Python 2.3 try running the test suite?
I'm seeing a problem with test_SeqIO.py and test_AlignIO.py when they
call Bio.Nexus to construct a particular Nexus object (using seqences
originally read in from Tests/Nexus/test_Nexus_input.nex for what its
worth). This triggers:
TypeError: zip() requires at least one sequence
On Linux with Python 2.4, and Mac OS X with Python 2.5 these two tests
both passed for me.
Peter
From biopython at maubp.freeserve.co.uk Sat Sep 6 05:06:20 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sat, 6 Sep 2008 10:06:20 +0100
Subject: [Biopython-dev] Nexus issue on Windows with Python 2.3
In-Reply-To: <320fb6e00809051619v762ee276ld0d44a300403bfeb@mail.gmail.com>
References: <320fb6e00809051619v762ee276ld0d44a300403bfeb@mail.gmail.com>
Message-ID: <320fb6e00809060206q88a44cblb1188380ae921dde@mail.gmail.com>
On Sat, Sep 6, 2008 at 12:19 AM, Peter wrote:
> Can anyone using Windows and/or Python 2.3 try running the test suite?
>
> I'm seeing a problem with test_SeqIO.py and test_AlignIO.py when they
> call Bio.Nexus to construct a particular Nexus object (using seqences
> originally read in from Tests/Nexus/test_Nexus_input.nex for what its
> worth). This triggers:
>
> TypeError: zip() requires at least one sequence
>
> On Linux with Python 2.4, and Mac OS X with Python 2.5 these two tests
> both passed for me.
This was a python 2.3 problem, the Nexus code (Bio/Nexus/Nexus.py line
1633) was using a Python 2.4+ only feature, see
http://docs.python.org/lib/built-in-funcs.html
sitesm=zip(*[self.matrix[t].tostring() for t in self.taxlabels])
I've added a check in CVS for an empty list of taxlabels (with a
comment about python 2.3).
Peter
From biopython at maubp.freeserve.co.uk Sat Sep 6 06:04:08 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sat, 6 Sep 2008 11:04:08 +0100
Subject: [Biopython-dev] New line issues in the source zip or tarballs
Message-ID: <320fb6e00809060304h429f1085r301170aa93d4eb73@mail.gmail.com>
I've run into a little issue on Windows while preparing Biopython 1.48,
If you check out the latest code from CVS on Windows, then assuming
the CVS client is setup correctly, all the python files and plain text
input files get DOS/Windows newlines. Running the test suite looks OK
(there are a few little known issues l've previously mentioned).
However, having built a release on Linux, it seems both the tarball
and the zip file contain the text files using Unix newlines. I can
build Biopython on Windows from the zip file (Unix newlines are not a
problem for running the python code), but it does break a few of the
unit tests (test_SCOP_Cla.py and test_SCOP_Raf.py and
test_PopGen_SimCoal_nodepend.py).
This is only an issue for the minority of Windows users who will
actually run the test suite. Most will just use the click-and-run
installers which don't include the tests, and I expect anyone trying
to build Biopython on Windows will probably use CVS. So we could just
ignore this for the time being...
One solution would be to try and tweak the source code distributions
so the tarball uses linux line endings, while the zip file uses
DOS/Windows. This does seem nasty.
Or, we can try and tweak the failing unit tests to cope with their
input files in either format.
In the case of test_PopGen_SimCoal_nodepend.py the failure is
expecting simple.par and simple_100_30.par to be exactly the same size
(in class TemplateTest, line 47). This is not true going to be true
when the input file uses Unix new lines but the generated file uses
Windows new lines. Perhaps using a simple bit of code to load the
files line by line and compare them would work here?
Peter
From fkauff at biologie.uni-kl.de Mon Sep 8 03:34:32 2008
From: fkauff at biologie.uni-kl.de (Frank Kauff)
Date: Mon, 08 Sep 2008 09:34:32 +0200
Subject: [Biopython-dev] Nexus issue on Windows with Python 2.3
In-Reply-To: <320fb6e00809060206q88a44cblb1188380ae921dde@mail.gmail.com>
References: <320fb6e00809051619v762ee276ld0d44a300403bfeb@mail.gmail.com>
<320fb6e00809060206q88a44cblb1188380ae921dde@mail.gmail.com>
Message-ID: <48C4D588.701@biologie.uni-kl.de>
Peter wrote:
> On Sat, Sep 6, 2008 at 12:19 AM, Peter wrote:
>
>> Can anyone using Windows and/or Python 2.3 try running the test suite?
>>
>> I'm seeing a problem with test_SeqIO.py and test_AlignIO.py when they
>> call Bio.Nexus to construct a particular Nexus object (using seqences
>> originally read in from Tests/Nexus/test_Nexus_input.nex for what its
>> worth). This triggers:
>>
>> TypeError: zip() requires at least one sequence
>>
>> On Linux with Python 2.4, and Mac OS X with Python 2.5 these two tests
>> both passed for me.
>>
>
> This was a python 2.3 problem, the Nexus code (Bio/Nexus/Nexus.py line
> 1633) was using a Python 2.4+ only feature, see
> http://docs.python.org/lib/built-in-funcs.html
>
> sitesm=zip(*[self.matrix[t].tostring() for t in self.taxlabels])
>
> I've added a check in CVS for an empty list of taxlabels (with a
> comment about python 2.3).
>
>
Good catch. Thanks Peter for fixing it.
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
>
--
J-Prof. Dr. Frank Kauff
Molecular Phylogenetics
FB Biologie, 13/276
TU Kaiserslautern
Postfach 3049
67653 Kaiserslautern
Tel. +49 (0)631 205-2562
Fax. +49 (0)631 205-2998
email: fkauff at biologie.uni-kl.de
skype: frank.kauff
From p.j.a.cock at googlemail.com Mon Sep 8 05:41:33 2008
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 8 Sep 2008 10:41:33 +0100
Subject: [Biopython-dev] test_SVDSuperimposer.py
Message-ID: <320fb6e00809080241x3db79410lf54dd1612e5e04cc@mail.gmail.com>
Hi all,
I've noticed test_SVDSuperimposer.py seems to stall/run for ever on
one of the Linux machines I have run it one. However, on my main
Linux machine it is fine, and on Mac OS X. Has anyone else noticed
this? Maybe there is some common thread (e.g. version of Numeric or
something).
Peter
From p.j.a.cock at googlemail.com Mon Sep 8 06:43:17 2008
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 8 Sep 2008 11:43:17 +0100
Subject: [Biopython-dev] test_SVDSuperimposer.py
In-Reply-To: <364233.84511.qm@web62401.mail.re1.yahoo.com>
References: <320fb6e00809080241x3db79410lf54dd1612e5e04cc@mail.gmail.com>
<364233.84511.qm@web62401.mail.re1.yahoo.com>
Message-ID: <320fb6e00809080343t78e068een8e50d0237d9852c8@mail.gmail.com>
On Mon, Sep 8, 2008 at 11:36 AM, Michiel de Hoon wrote:
> When installing Numerical Python, run
>
> python setup.py config
>
> before build, install.
> (assuming you are using Numerical Python version 24.2).
>
> --Michiel.
I've checked and it is version 24.2 that is installed on the machine
in question. I'm not sure if this was installed from source or via
the yum package manager, but Numeric seems to work.
$ python
Python 2.5 (r25:51908, Nov 23 2006, 18:40:28)
[GCC 4.1.1 20061011 (Red Hat 4.1.1-30)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import Numeric
>>> Numeric.__version__
'24.2'
I imagine if there was something seriously wrong with this Numeric
installation it would have shown up in other unit tests. So it looks
like the version of Numeric isn't the issue. Any other ideas?
I take it you've never had a problem with test_SVDSuperimposer.py getting stuck?
Thanks,
Peter
From mjldehoon at yahoo.com Mon Sep 8 06:36:44 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Mon, 8 Sep 2008 03:36:44 -0700 (PDT)
Subject: [Biopython-dev] test_SVDSuperimposer.py
In-Reply-To: <320fb6e00809080241x3db79410lf54dd1612e5e04cc@mail.gmail.com>
Message-ID: <364233.84511.qm@web62401.mail.re1.yahoo.com>
When installing Numerical Python, run
python setup.py config
before build, install.
(assuming you are using Numerical Python version 24.2).
--Michiel.
--- On Mon, 9/8/08, Peter Cock wrote:
> From: Peter Cock
> Subject: [Biopython-dev] test_SVDSuperimposer.py
> To: "BioPython-Dev Mailing List"
> Date: Monday, September 8, 2008, 5:41 AM
> Hi all,
>
> I've noticed test_SVDSuperimposer.py seems to stall/run
> for ever on
> one of the Linux machines I have run it one. However, on
> my main
> Linux machine it is fine, and on Mac OS X. Has anyone else
> noticed
> this? Maybe there is some common thread (e.g. version of
> Numeric or
> something).
>
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
From biopython at maubp.freeserve.co.uk Mon Sep 8 07:20:57 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 8 Sep 2008 12:20:57 +0100
Subject: [Biopython-dev] CVS freeze for Biopython 1.48
In-Reply-To: <320fb6e00809050343y598c436fi9aa65ec272f1492d@mail.gmail.com>
References: <320fb6e00809050343y598c436fi9aa65ec272f1492d@mail.gmail.com>
Message-ID: <320fb6e00809080420y5f456ab4uc4ec42d845c25f1a@mail.gmail.com>
On Fri, Sep 5, 2008 at 11:43 AM, Peter wrote:
> Dear all,
>
> I'm going to try and put together Biopython 1.48 this afternoon, so
> could you all not commit any changes until further notice please.
This took longer than planned. I have now tagged CVS and uploaded the
tar-ball and zip file for Biopython 1.48 to http://biopython.org/DIST/
as usual. Before we make the public announcement (email, news server,
and wiki pages), having one or two people try downloading these,
installing from source and running the unit tests would be great.
Little things (like documentation improvements!) can go into CVS now,
but could you all refrain from any major changes (like Numeric/numpy,
additional deprecations or removals) until the release has been public
for a few days without issue? Just in case we have to tweak things,
this would make dealing with CVS easier. Thanks.
> I'll be doing the source code releases (and possibly a Windows
> installer for Python 2.3 tonight if my old laptop still has all the MS
> compilers working), but there will then be a slight delay while we get
> the (other) Windows installers done.
I may be able to do the Python 2.3 Windows installer tonight - we'll see.
For future reference, hevea 1.08 which my linux box had installed
doesn't work nicely on the tutorial (the title page information goes
missing), but hevea 1.10 is fine.
http://biopython.org/DIST/docs/tutorial/Tutorial.pdf
http://biopython.org/DIST/docs/tutorial/Tutorial.html
Also, we're currently using epydoc version 3.0.1 for the API documentation:
http://biopython.org/DIST/docs/api/
I did check in a few more module level docstrings so this does look a
bit more complete than in Biopython 1.47. There is still room for
improvement, for example Bio.SeqUtils needs some love. Also many of
the deprecated modules don't say they are deprecated in the module
level docstring which I think is good thing to do. Any views on this?
Thanks,
Peter
From tiagoantao at gmail.com Mon Sep 8 07:42:51 2008
From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=)
Date: Mon, 8 Sep 2008 12:42:51 +0100
Subject: [Biopython-dev] New line issues in the source zip or tarballs
In-Reply-To: <320fb6e00809060304h429f1085r301170aa93d4eb73@mail.gmail.com>
References: <320fb6e00809060304h429f1085r301170aa93d4eb73@mail.gmail.com>
Message-ID: <6d941f120809080442r1797666eu70e35c60353c5462@mail.gmail.com>
Hi,
On Sat, Sep 6, 2008 at 11:04 AM, Peter wrote:
> In the case of test_PopGen_SimCoal_nodepend.py the failure is
> expecting simple.par and simple_100_30.par to be exactly the same size
> (in class TemplateTest, line 47). This is not true going to be true
> when the input file uses Unix new lines but the generated file uses
> Windows new lines. Perhaps using a simple bit of code to load the
> files line by line and compare them would work here?
>
I am currently at a workshop (I belong to the organization committee, so I
don't have much time), but I will try to sort this in the next couple of
days.
Tiago
From biopython at maubp.freeserve.co.uk Mon Sep 8 08:14:09 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 8 Sep 2008 13:14:09 +0100
Subject: [Biopython-dev] New line issues in the source zip or tarballs
In-Reply-To: <6d941f120809080442r1797666eu70e35c60353c5462@mail.gmail.com>
References: <320fb6e00809060304h429f1085r301170aa93d4eb73@mail.gmail.com>
<6d941f120809080442r1797666eu70e35c60353c5462@mail.gmail.com>
Message-ID: <320fb6e00809080514u5df6d9dej144c783076cbe467@mail.gmail.com>
Tiago wrote:
> Peter wrote:
>> In the case of test_PopGen_SimCoal_nodepend.py the failure is
>> expecting simple.par and simple_100_30.par to be exactly the same size
>> (in class TemplateTest, line 47). This is not true going to be true
>> when the input file uses Unix new lines but the generated file uses
>> Windows new lines. Perhaps using a simple bit of code to load the
>> files line by line and compare them would work here?
>
> I am currently at a workshop (I belong to the organization committee, so I
> don't have much time), but I will try to sort this in the next couple of
> days.
Hi Tiago,
This issue new line issue has probably been there since Biopython 1.45
without anyone else spotting it, so I don't see fixing it as urgent.
Hopefully we can resolve this for the next release instead.
I hope your workshop goes well,
Peter
From mjldehoon at yahoo.com Mon Sep 8 08:11:56 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Mon, 8 Sep 2008 05:11:56 -0700 (PDT)
Subject: [Biopython-dev] test_SVDSuperimposer.py
In-Reply-To: <320fb6e00809080343t78e068een8e50d0237d9852c8@mail.gmail.com>
Message-ID: <339043.58717.qm@web62407.mail.re1.yahoo.com>
Try if the eigenvalues function in Numerical Python works. If it hangs, you'll know the problem is in Numerical Python.
--Michiel
--- On Mon, 9/8/08, Peter Cock wrote:
> From: Peter Cock
> Subject: Re: [Biopython-dev] test_SVDSuperimposer.py
> To: mjldehoon at yahoo.com
> Cc: "BioPython-Dev Mailing List"
> Date: Monday, September 8, 2008, 6:43 AM
> On Mon, Sep 8, 2008 at 11:36 AM, Michiel de Hoon
> wrote:
> > When installing Numerical Python, run
> >
> > python setup.py config
> >
> > before build, install.
> > (assuming you are using Numerical Python version
> 24.2).
> >
> > --Michiel.
>
> I've checked and it is version 24.2 that is installed
> on the machine
> in question. I'm not sure if this was installed from
> source or via
> the yum package manager, but Numeric seems to work.
>
> $ python
> Python 2.5 (r25:51908, Nov 23 2006, 18:40:28)
> [GCC 4.1.1 20061011 (Red Hat 4.1.1-30)] on linux2
> Type "help", "copyright",
> "credits" or "license" for more
> information.
> >>> import Numeric
> >>> Numeric.__version__
> '24.2'
>
> I imagine if there was something seriously wrong with this
> Numeric
> installation it would have shown up in other unit tests.
> So it looks
> like the version of Numeric isn't the issue. Any other
> ideas?
>
> I take it you've never had a problem with
> test_SVDSuperimposer.py getting stuck?
>
> Thanks,
>
> Peter
From p.j.a.cock at googlemail.com Mon Sep 8 08:24:01 2008
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 8 Sep 2008 13:24:01 +0100
Subject: [Biopython-dev] test_SVDSuperimposer.py
In-Reply-To: <339043.58717.qm@web62407.mail.re1.yahoo.com>
References: <320fb6e00809080343t78e068een8e50d0237d9852c8@mail.gmail.com>
<339043.58717.qm@web62407.mail.re1.yahoo.com>
Message-ID: <320fb6e00809080524i1f75c601p2a7191b6207bd2e@mail.gmail.com>
On Mon, Sep 8, 2008 at 1:11 PM, Michiel de Hoon wrote:
> Try if the eigenvalues function in Numerical Python works. If it hangs, you'll know the problem is in Numerical Python.
Good thinking - it does indeed hang on the machine in question,
$ python
Python 2.5 (r25:51908, Nov 23 2006, 18:40:28)
[GCC 4.1.1 20061011 (Red Hat 4.1.1-30)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import Numeric, LinearAlgebra
>>> Numeric.__version__
'24.2'
>>> data = Numeric.array([[1,2,3],[4,5,6],[7,8,9]])
>>> LinearAlgebra.eigenvalues(data)
[hangs here]
This works fine on another Linux box,
$ python
Python 2.4.3 (#1, Jun 27 2006, 16:32:39)
[GCC 3.4.5 20051201 (Red Hat 3.4.5-2)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import Numeric, LinearAlgebra
>>> Numeric.__version__
'24.2'
>>> data = Numeric.array([[1,2,3],[4,5,6],[7,8,9]])
>>> LinearAlgebra.eigenvalues(data)
array([ 1.61168440e+01, -1.11684397e+00, -1.30367773e-15])
And this example also works on the Mac:
$ python
Python 2.5.2 (r252:60911, Feb 22 2008, 07:57:53)
[GCC 4.0.1 (Apple Computer, Inc. build 5363)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import Numeric, LinearAlgebra
>>> Numeric.__version__
'24.2'
>>> data = Numeric.array([[1,2,3],[4,5,6],[7,8,9]])
>>> LinearAlgebra.eigenvalues(data)
array([ 1.61168440e+01, -1.11684397e+00, -1.30367773e-15])
So we can probably rule out a problem with Biopython in
test_SVDSuperimposer.py which is good, but I should probably try and
work out what is wrong with Numeric on this particular machine...
Thanks for your advice,
Peter
From mjldehoon at yahoo.com Mon Sep 8 08:56:37 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Mon, 8 Sep 2008 05:56:37 -0700 (PDT)
Subject: [Biopython-dev] test_SVDSuperimposer.py
In-Reply-To: <320fb6e00809080524i1f75c601p2a7191b6207bd2e@mail.gmail.com>
Message-ID: <622098.12504.qm@web62404.mail.re1.yahoo.com>
> So we can probably rule out a problem with Biopython in
> test_SVDSuperimposer.py which is good, but I should
> probably try and
> work out what is wrong with Numeric on this particular
> machine...
>
Have a look at this:
http://projects.scipy.org/pipermail/numpy-discussion/2004-January/015074.html
--Michiel.
From quwubin at gmail.com Mon Sep 8 09:59:57 2008
From: quwubin at gmail.com (Wubin Qu)
Date: Mon, 8 Sep 2008 21:59:57 +0800
Subject: [Biopython-dev] BioPythonGUI: Graphical User Interface for BioPython
Message-ID:
Hi all,
I started a new project named BioPythonGUI for a few of days. The following
is the 'About' page from BioPythonGUI project.
BioPythonGUI is a Graphical User Interface of BioPython.
BioPython is a widely used python module set in bioinformatics. It help
researchers:
- Parsing files in di fferent database formats
- Interfaces into programs like Blast, Entrez and PubMed
- A sequence class (can transcribe, translate, invert, etc)
- Code for handling alignments of sequences
- Clustering algorithms
- etc.
However, it's not everyone can use the BioPython, especially ones who do not
know much about the programming. How can you expect a professor who never
known about any programming to use BioPython to parse the BLAST report file?
This is the problem which the BioPythonGUI would solve. I started the
project with the goal "Everyone can use BioPython with BioPythonGUI".
Until now, there are two modules SeqGUI and BlastGUI are available in
BioPythonGUI. I would greatly appreciate if you use BioPythonGUI and send me
the feedback.
Please see the developer's blog for details.
Project Blog: http://biopythongui.blogspot.com/
Download: https://sites.google.com/site/biopythongui/download
Screenshots: http://picasaweb.google.com/quwubin/BioPythonGUI02#
______________________________
Best regards,
Wubin Qu
From p.j.a.cock at googlemail.com Mon Sep 8 10:12:15 2008
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Mon, 8 Sep 2008 15:12:15 +0100
Subject: [Biopython-dev] [BioPython] BioPythonGUI: Graphical User
Interface for BioPython
In-Reply-To:
References:
Message-ID: <320fb6e00809080712v6c33d42fheb982f52e62e6e95@mail.gmail.com>
On Mon, Sep 8, 2008 at 2:43 PM, Wubin Qu wrote:
> Hi all,
>
> I started a new project named BioPythonGUI for a few of days.
Hello Wubin Qu,
> BioPythonGUI is a Graphical User Interface of BioPython.
I'm uncomfortable about the name BioPythonGUI, as this to me implies
it is part of Biopython (whereas is it currently just a third party
project built on top of Biopython). What do other people think?
> However, it's not everyone can use the BioPython, especially ones who do not
> know much about the programming. How can you expect a professor who never
> known about any programming to use BioPython to parse the BLAST report file?
> This is the problem which the BioPythonGUI would solve. I started the
> project with the goal "Everyone can use BioPython with BioPythonGUI".
I don't really understand your goal. How would a non-programming
professor use your program to parse a BLAST report file? The NCBI
already try and make the HTML and plain text output useful to
non-programmers and from looking at the screenshots I don't see how
your tool would help.
> Until now, there are two modules SeqGUI and BlastGUI are available in
> BioPythonGUI. I would greatly appreciate if you use BioPythonGUI and send me
> the feedback.
I see your module SeqGUI builds on the SeqGui.py in BioPython (in the
scripts directory). It might make sense to include your improvements
to this code as part of Biopython. I haven't looked at your code yet,
so I don't know how much you've changed things.
It is nice to be able to be able to translate, transcribe, reverse
complement etc in a GUI, but personally I don't see the point or
writing a little application just for this. Also, there are probably
many many existing tools out there that already offer this
functionality. However, I am happy writing code, so I am not in your
target audience.
Regarding your BlastGUI idea, I can see that a GUI for standalone
blast is nicer than the command line for some people. However, I
don't see how this is more useful than running a local blast web
server (something the NCBI already provides).
Sorry for being so negative,
Peter
From quwubin at gmail.com Mon Sep 8 10:38:27 2008
From: quwubin at gmail.com (Wubin Qu)
Date: Mon, 8 Sep 2008 22:38:27 +0800
Subject: [Biopython-dev] [BioPython] BioPythonGUI: Graphical User
Interface for BioPython
In-Reply-To: <320fb6e00809080712v6c33d42fheb982f52e62e6e95@mail.gmail.com>
References:
<320fb6e00809080712v6c33d42fheb982f52e62e6e95@mail.gmail.com>
Message-ID:
Hi Peter,
Thans for your reply.
My goal is simple: Programs with GUI are easily to use. BioPython with GUI
will facilitate people.
The next module is: BlastParserGUI. I think it will be useful.
Yes, SeqGUI is built on SeqGui.py. And I learn a lot from SeqGui.py. It
inspires me to build other modules. I mentioned this
here
.
______________________________
Best regards,
Wubin Qu
2008/9/8 Peter Cock
> On Mon, Sep 8, 2008 at 2:43 PM, Wubin Qu wrote:
> > Hi all,
> >
> > I started a new project named BioPythonGUI for a few of days.
>
> Hello Wubin Qu,
>
> > BioPythonGUI is a Graphical User Interface of BioPython.
>
> I'm uncomfortable about the name BioPythonGUI, as this to me implies
> it is part of Biopython (whereas is it currently just a third party
> project built on top of Biopython). What do other people think?
>
> > However, it's not everyone can use the BioPython, especially ones who do
> not
> > know much about the programming. How can you expect a professor who never
> > known about any programming to use BioPython to parse the BLAST report
> file?
> > This is the problem which the BioPythonGUI would solve. I started the
> > project with the goal "Everyone can use BioPython with BioPythonGUI".
>
> I don't really understand your goal. How would a non-programming
> professor use your program to parse a BLAST report file? The NCBI
> already try and make the HTML and plain text output useful to
> non-programmers and from looking at the screenshots I don't see how
> your tool would help.
>
> > Until now, there are two modules SeqGUI and BlastGUI are available in
> > BioPythonGUI. I would greatly appreciate if you use BioPythonGUI and send
> me
> > the feedback.
>
> I see your module SeqGUI builds on the SeqGui.py in BioPython (in the
> scripts directory). It might make sense to include your improvements
> to this code as part of Biopython. I haven't looked at your code yet,
> so I don't know how much you've changed things.
>
> It is nice to be able to be able to translate, transcribe, reverse
> complement etc in a GUI, but personally I don't see the point or
> writing a little application just for this. Also, there are probably
> many many existing tools out there that already offer this
> functionality. However, I am happy writing code, so I am not in your
> target audience.
>
> Regarding your BlastGUI idea, I can see that a GUI for standalone
> blast is nicer than the command line for some people. However, I
> don't see how this is more useful than running a local blast web
> server (something the NCBI already provides).
>
> Sorry for being so negative,
>
> Peter
>
From biopython at maubp.freeserve.co.uk Tue Sep 9 06:14:11 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 9 Sep 2008 11:14:11 +0100
Subject: [Biopython-dev] Biopython 1.48 released
Message-ID: <320fb6e00809090314s722f404bqda71d7d9f97360e7@mail.gmail.com>
We are pleased to announce the release of Biopython 1.48. Some new
functionality has been added, a few bugs have been fixed, the
documentation has been updated, plus several obsolete modules have
been deprecated (or explicitly labelled as obsolete).
The following additional file formats are now supported in Bio.SeqIO
and Bio.AlignIO:
* reading and writing "tab" format (simple tab separated)
* writing "nexus" files
* reading "pir" files (NBRF/PIR)
* basic support for writing "genbank" files (GenBank plain text)
This release also fixes some problems reading Clustal alignments
(introduced in Biopython 1.46 when consolidating Bio.AlignIO and
Bio.Clustalw), and some updates to the Bio.Sequencing parsers.
The SeqRecord and Alignment objects have a new method to get the
object as a string in a given file format (handled via Bio.SeqIO and
Bio.AlignIO).
Bio.PubMed and the online code in Bio.GenBank are now considered
obsolete, and we intend to deprecate them after the next release. For
accessing PubMed and GenBank, please use Bio.Entrez instead. Martel
and Bio.Mindy are now considered to be obsolete, and are likely to be
deprecated and removed in a future release, at which point we will
drop the optional dependency on mxTextTools. Bio.Fasta is also
considered to be obsolete, please use Bio.SeqIO instead. We do intend
to deprecate this module eventually, however, for several years this
was the primary FASTA parsing module in Biopython and is likely to be
in use in many existing scripts.
In addition a number of other modules have been deprecated, including:
Bio.MetaTool, Bio.EUtils, Bio.Saf, Bio.NBRF, and Bio.IntelliGenetics -
see the DEPRECATED file for full details.
Source distributions are available from the Biopython website at
http://biopython.org, and Windows installers will be added shortly.
My thanks to all bug reporters, code contributors and others who made
this new release possible.
Peter, on behalf of the Biopython developers
P.S. This message will be forwarded to the Biopython anoucement
mailing list shortly.
For those of you who prefer news readers to email lists, have a look
at the OBF news server:
http://news.open-bio.org/news/2008/09/biopython-release-148/
where there are Biopython news feeds available:
http://news.open-bio.org/news/category/obf-projects/biopython/feed/rdf
http://news.open-bio.org/news/category/obf-projects/biopython/feed/rss
http://news.open-bio.org/news/category/obf-projects/biopython/feed/rss2
http://news.open-bio.org/news/category/obf-projects/biopython/feed/atom
From mhampton at d.umn.edu Tue Sep 9 10:21:15 2008
From: mhampton at d.umn.edu (Marshall Hampton)
Date: Tue, 9 Sep 2008 09:21:15 -0500 (CDT)
Subject: [Biopython-dev] BioPythonGUI: Graphical User Interface for BioPython
Message-ID:
Hi,
I think I'd mentioned this before on this list, but the BioPythonGUI post
made me think I should again: people interested in
GUIs and visualization with biopython should check out the Sage project:
www.sagemath.org. I help maintain the inclusion of
biopython in Sage as an optional package. Sage is a python-based
computational platform that unites a great deal of
mathematical software, and uses a web browser as its GUI. This makes
sharing code very easy.
I teach a bioinformatics course using Sage and biopython, which has been
working very well.
I wrote a brief introduction that gives some idea of what is possible with
the Sage/biopython combination:
http://openwetware.org/wiki/Open_writing_projects/Sage_and_cython_a_brief_introduction
...and the Sage wiki @interact examples might also give some ideas:
http://wiki.sagemath.org/interact
Cheers,
Marshall Hampton
University of Minnesota, Duluth
PS. Sorry if this gets sent twice, I think I messed up the list address
the first time.
From mhampton at d.umn.edu Tue Sep 9 10:14:27 2008
From: mhampton at d.umn.edu (Marshall Hampton)
Date: Tue, 9 Sep 2008 09:14:27 -0500 (CDT)
Subject: [Biopython-dev] BioPythonGUI: Graphical User Interface for BioPython
Message-ID:
Hi,
I think I'd mentioned this before on this list, but the BioPythonGUI post
made me think I should again: people interested in GUIs and visualization
with biopython should check out the Sage project: www.sagemath.org. I
help maintain the inclusion of biopython in Sage as an optional package.
Sage is a python-based computational platform that unites a great deal of
mathematical software, and uses a web browser as its GUI. This makes
sharing code very easy.
I teach a bioinformatics course using Sage and biopython, which has been
working very well.
I wrote a brief introduction that gives some idea of what is possible with
the Sage/biopython combination:
http://openwetware.org/wiki/Open_writing_projects/Sage_and_cython_a_brief_introduction
...and the Sage wiki @interact examples might also give some ideas:
http://wiki.sagemath.org/interact
Cheers,
Marshall Hampton
University of Minnesota, Duluth
From quwubin at gmail.com Tue Sep 9 20:28:55 2008
From: quwubin at gmail.com (Wubin Qu)
Date: Wed, 10 Sep 2008 08:28:55 +0800
Subject: [Biopython-dev] BioPythonGUI: Graphical User Interface for
BioPython
In-Reply-To:
References:
Message-ID:
Hi,
Thank you.
I am learning Sage now.
I think that is another way of GUI and it's great. I'm sure I will learn a
lot from Sage.
______________________________
Best regards,
Wubin Qu
2008/9/9 Marshall Hampton
>
> Hi,
>
> I think I'd mentioned this before on this list, but the BioPythonGUI post
> made me think I should again: people interested in GUIs and visualization
> with biopython should check out the Sage project: www.sagemath.org. I
> help maintain the inclusion of biopython in Sage as an optional package.
> Sage is a python-based computational platform that unites a great deal of
> mathematical software, and uses a web browser as its GUI. This makes
> sharing code very easy.
>
> I teach a bioinformatics course using Sage and biopython, which has been
> working very well.
>
> I wrote a brief introduction that gives some idea of what is possible with
> the Sage/biopython combination:
>
> http://openwetware.org/wiki/Open_writing_projects/Sage_and_cython_a_brief_introduction
>
> ...and the Sage wiki @interact examples might also give some ideas:
> http://wiki.sagemath.org/interact
>
> Cheers,
> Marshall Hampton
> University of Minnesota, Duluth
>
From bugzilla-daemon at portal.open-bio.org Wed Sep 10 05:03:43 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 10 Sep 2008 05:03:43 -0400
Subject: [Biopython-dev] [Bug 2583] New: small bug in NCBIXML.py
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2583
Summary: small bug in NCBIXML.py
Product: Biopython
Version: Not Applicable
Platform: All
OS/Version: All
Status: NEW
Severity: minor
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: christen at unice.fr
Hi
When parsing an xml blast output
b_record.sc_match and b_record.sc_mismatch are returned as none
this is because the lines
self._blast.sc_match=self._parameters.sc_match
self._blast.sc_mismatch=self._parameters.sc_mismatch
are missing in
def _end_Iteration(self):
This is a minor bug because it is very rare that a user wants these
informations, as usually they know the parameters they used to run blast.
Best regards
Richard Christen, U of Nice, France
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From mjldehoon at yahoo.com Wed Sep 10 10:28:03 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Wed, 10 Sep 2008 07:28:03 -0700 (PDT)
Subject: [Biopython-dev] NumPy conversion roadmap
Message-ID: <909028.58880.qm@web62404.mail.re1.yahoo.com>
Hi everybody,
Now that Biopython 1.48 is out (thanks Peter!), we can now start to consider the conversion from Numerical Python to NumPy.
I'd like to propose the following steps:
1) Let's wait for a week or so before making any NumPy-related commits to see if any serious problems show up with the 1.48 release.
2) Three modules use Numerical Python at the C-level: Bio.Cluster, Bio.KDTree, and Bio.Affy. I have a NumPy-based module ready for Bio.Cluster. For Bio.KDTree and Bio.Affy, see my next mails.
3) Once these three modules are converted, Biopython can be compiled again. We can then consider the modules that use Numerical Python at the Python-level. There are about ten of those. Some of them are heavily used (such as Bio.PDB), whereas others are more obscure. Conversion is usually trivial, but I'd like to suggest that we take this opportunity also to review each of these modules to see if any should be deprecated.
Comments, anybody?
--Michiel.
From mjldehoon at yahoo.com Wed Sep 10 10:28:53 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Wed, 10 Sep 2008 07:28:53 -0700 (PDT)
Subject: [Biopython-dev] Bio.Affy
Message-ID: <790264.35889.qm@web62401.mail.re1.yahoo.com>
Hi everybody,
The C++ code in Bio.Affy seems to be out of date; it is distributed with the Biopython releases but it is not actually used. There's a comment in setup.py saying that this C++ code was replaced by Python code. Does anybody know more about this? Can the C++ code in Bio.Affy be removed?
--Michiel.
From mjldehoon at yahoo.com Wed Sep 10 10:35:22 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Wed, 10 Sep 2008 07:35:22 -0700 (PDT)
Subject: [Biopython-dev] Bio.KDTree
Message-ID: <941377.41064.qm@web62401.mail.re1.yahoo.com>
Hi everybody,
I have a prototype version of Bio.KDTree for NumPy. This code differs from the current Bio.KDTree in that is uses C instead of C++. Thomas (or anybody else), any objections if I upload this version to CVS to replace the current Bio.KDTree?
--Michiel.
From chapmanb at 50mail.com Wed Sep 10 16:26:13 2008
From: chapmanb at 50mail.com (Brad Chapman)
Date: Wed, 10 Sep 2008 16:26:13 -0400
Subject: [Biopython-dev] NumPy conversion roadmap
Message-ID: <20080910202613.GR21009@localdomain>
Hi all;
Congrats on the 1.48 release. Great stuff.
Michiel, I wanted to follow up on your NumPy conversion plans. I
have those NumPy changes discussed on the main list earlier this
month ready to check in, along with tests passing and documentation
changes and all those good things.
This does very basic conversions to NumPy using the compatibility
modules. It sounds like a good path would be for me to check these
changes in as a starting point and then you can go with your in
depth changes from there. Hopefully, this will save you some time finding all
the imports and that kind of fun.
Any objections? If not, I can get these in right away and you can go
from there.
Brad
--
Brad Chapman
Codon Devices
http://www.codondevices.com
From bugzilla-daemon at portal.open-bio.org Wed Sep 10 20:29:42 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 10 Sep 2008 20:29:42 -0400
Subject: [Biopython-dev] [Bug 2585] New: Error in
Bio.SeqUtils.apply_on_multi_fasta
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2585
Summary: Error in Bio.SeqUtils.apply_on_multi_fasta
Product: Biopython
Version: 1.48
Platform: PC
OS/Version: Linux
Status: NEW
Severity: normal
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: sbassi at gmail.com
Function "apply_on_multi_fasta" (in SeqUtils) has properties that are no longer
valid. See this line:
arguments = [record.sequence]
And this line:
results.append('>%s\n%s' % (record.title, result))
This provokes an error when trying to run this function (sorry I don't have the
error message in this computer).
A possible replacement for both lines:
arguments = [record.seq]
and:
results.append('>%s\n%s' % (record.name, result))
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Sep 11 04:01:52 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 11 Sep 2008 04:01:52 -0400
Subject: [Biopython-dev] [Bug 2586] New: New version of MeltingTemp.py
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2586
Summary: New version of MeltingTemp.py
Product: Biopython
Version: 1.48
Platform: PC
OS/Version: Linux
Status: NEW
Severity: normal
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: sbassi at gmail.com
This version of MeltingTemp.py has a quick test and some reformatting to make
it easier to read (code style changed)
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Sep 11 04:03:54 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 11 Sep 2008 04:03:54 -0400
Subject: [Biopython-dev] [Bug 2586] New version of MeltingTemp.py
In-Reply-To:
Message-ID: <200809110803.m8B83sr2017438@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2586
------- Comment #1 from sbassi at gmail.com 2008-09-11 04:03 EST -------
Created an attachment (id=994)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=994&action=view)
New version of MeltingTemp.py
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From mjldehoon at yahoo.com Thu Sep 11 07:06:45 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Thu, 11 Sep 2008 04:06:45 -0700 (PDT)
Subject: [Biopython-dev] NumPy conversion roadmap
In-Reply-To: <20080910202613.GR21009@localdomain>
Message-ID: <828420.78524.qm@web62408.mail.re1.yahoo.com>
--- On Wed, 9/10/08, Brad Chapman wrote:
> I have those NumPy changes discussed on the main list earlier
> this month ready to check in, along with tests passing and
> documentation changes and all those good things.
>
Thanks!
Those changes are for Bio.PDB, right? Bio.PDB being a heavily used module, your changes are very welcome. In a sense, Thomas has the last word on changes to Bio.PDB, since he wrote the module, but if there are no objections from Thomas then feel free to submit your changes to CVS.
--Michiel.
From chapmanb at 50mail.com Thu Sep 11 07:56:33 2008
From: chapmanb at 50mail.com (Brad Chapman)
Date: Thu, 11 Sep 2008 07:56:33 -0400
Subject: [Biopython-dev] NumPy conversion roadmap
In-Reply-To: <828420.78524.qm@web62408.mail.re1.yahoo.com>
References: <20080910202613.GR21009@localdomain>
<828420.78524.qm@web62408.mail.re1.yahoo.com>
Message-ID: <20080911115633.GD6200@localdomain>
Hi Michiel;
> Thanks!
> Those changes are for Bio.PDB, right? Bio.PDB being a heavily used
> module, your changes are very welcome. In a sense, Thomas has the last
> word on changes to Bio.PDB, since he wrote the module, but if there
> are no objections from Thomas then feel free to submit your changes to
> CVS.
Yes, these handle PDB and all other Numeric modules. The changes are
not to the code but rather to the imports so rather slight. We can
move forward from here to a full port to NumPy if desired, but this
should give the same functionality but allow people to have the up
to date NumPy libraries.
I checked everything in now so it should appear in CVS now. Let me
know if there are any problems, and feel free to improve on these
changes as y'all find best.
Thanks,
Brad
--
Brad Chapman
Codon Devices
http://www.codondevices.com
From mjldehoon at yahoo.com Sun Sep 14 09:23:53 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Sun, 14 Sep 2008 06:23:53 -0700 (PDT)
Subject: [Biopython-dev] NumPy conversion / Bio.KDTree / Bio.Cluster.
Message-ID: <259643.39583.qm@web62401.mail.re1.yahoo.com>
Hi everybody,
I just committed a bunch of changes to Bio.Cluster, Bio.KDTree, and setup.py that deal with the old Numerical Python to new NumPy conversion. With these changes, Biopython should compile with NumPy; any remaining references to the old Numerical Python are at the Python-level only. Since these are rather big changes, please try with the current version of CVS to see if everything compiles cleanly and all tests pass. Comments, questions, suggestions are welcome.
I also uploaded a plain C (instead of C++) version of Bio.KDTree, and adjusted setup.py accordingly.
--Michiel.
From chapmanb at 50mail.com Sun Sep 14 13:07:07 2008
From: chapmanb at 50mail.com (Brad Chapman)
Date: Sun, 14 Sep 2008 13:07:07 -0400
Subject: [Biopython-dev] NumPy conversion / Bio.KDTree / Bio.Cluster.
In-Reply-To: <259643.39583.qm@web62401.mail.re1.yahoo.com>
References: <259643.39583.qm@web62401.mail.re1.yahoo.com>
Message-ID: <1221412027.6552.1273897487@webmail.messagingengine.com>
Hi Michiel;
Great stuff. One quick note, it doesn't look like KDTreemodule.c got
checked into CVS:
gcc: Bio/KDTree/KDTreemodule.c: No such file or directory
> ls -l Bio/KDTree/
total 64
-rw-rw-r-- 1 chapmanb chapmanb 2641 2007-04-23 05:45 CKDTree.py
drwxrwxr-x 2 chapmanb chapmanb 4096 2008-09-14 12:39 CVS
-rw-rw-r-- 1 chapmanb chapmanb 166 2007-04-23 05:45 HISTORY
-rw-rw-r-- 1 chapmanb chapmanb 432 2007-04-23 05:45 __init__.py
-rw-rw-r-- 1 chapmanb synbio 29504 2008-09-14 09:15 KDTree.c
-rw-rw-r-- 1 chapmanb synbio 689 2008-09-14 12:39 KDTree.h
-rw-rw-r-- 1 chapmanb synbio 8165 2008-09-14 12:39 KDTree.py
-rw-rw-r-- 1 chapmanb synbio 151 2008-09-14 09:15 Neighbor.h
Brad
On Sun, 14 Sep 2008 06:23:53 -0700 (PDT), "Michiel de Hoon"
said:
> Hi everybody,
>
> I just committed a bunch of changes to Bio.Cluster, Bio.KDTree, and
> setup.py that deal with the old Numerical Python to new NumPy conversion.
> With these changes, Biopython should compile with NumPy; any remaining
> references to the old Numerical Python are at the Python-level only.
> Since these are rather big changes, please try with the current version
> of CVS to see if everything compiles cleanly and all tests pass.
> Comments, questions, suggestions are welcome.
>
> I also uploaded a plain C (instead of C++) version of Bio.KDTree, and
> adjusted setup.py accordingly.
>
> --Michiel.
>
>
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
--
Brad Chapman
chapmanb at 50mail.com
From mjldehoon at yahoo.com Sun Sep 14 13:10:16 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Sun, 14 Sep 2008 10:10:16 -0700 (PDT)
Subject: [Biopython-dev] NumPy conversion / Bio.KDTree / Bio.Cluster.
In-Reply-To: <259643.39583.qm@web62401.mail.re1.yahoo.com>
Message-ID: <673504.47838.qm@web62403.mail.re1.yahoo.com>
Hi everybody,
I just noticed that one file is missing in Biopython's CVS. I'll upload it as soon as possible but it may take a day or so. Sorry for the trouble.
--Michiel
--- On Sun, 9/14/08, Michiel de Hoon wrote:
> From: Michiel de Hoon
> Subject: [Biopython-dev] NumPy conversion / Bio.KDTree / Bio.Cluster.
> To: biopython-dev at biopython.org
> Date: Sunday, September 14, 2008, 9:23 AM
> Hi everybody,
>
> I just committed a bunch of changes to Bio.Cluster,
> Bio.KDTree, and setup.py that deal with the old Numerical
> Python to new NumPy conversion. With these changes,
> Biopython should compile with NumPy; any remaining
> references to the old Numerical Python are at the
> Python-level only. Since these are rather big changes,
> please try with the current version of CVS to see if
> everything compiles cleanly and all tests pass. Comments,
> questions, suggestions are welcome.
>
> I also uploaded a plain C (instead of C++) version of
> Bio.KDTree, and adjusted setup.py accordingly.
>
> --Michiel.
>
>
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
From mjldehoon at yahoo.com Tue Sep 16 07:12:03 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Tue, 16 Sep 2008 04:12:03 -0700 (PDT)
Subject: [Biopython-dev] NumPy conversion / Bio.KDTree / Bio.Cluster.
In-Reply-To: <1221412027.6552.1273897487@webmail.messagingengine.com>
Message-ID: <901846.37002.qm@web62403.mail.re1.yahoo.com>
Hi everybody,
I now uploaded Bio/KDTree/KDTreemodule.c to CVS. Biopython should now compile with the new NumPy (and not any more with the old Numerical Python).
--Michiel.
--- On Sun, 9/14/08, Brad Chapman wrote:
> From: Brad Chapman
> Subject: Re: [Biopython-dev] NumPy conversion / Bio.KDTree / Bio.Cluster.
> To: biopython-dev at biopython.org
> Date: Sunday, September 14, 2008, 1:07 PM
> Hi Michiel;
> Great stuff. One quick note, it doesn't look like
> KDTreemodule.c got
> checked into CVS:
>
> gcc: Bio/KDTree/KDTreemodule.c: No such file or directory
>
> > ls -l Bio/KDTree/
> total 64
> -rw-rw-r-- 1 chapmanb chapmanb 2641 2007-04-23 05:45
> CKDTree.py
> drwxrwxr-x 2 chapmanb chapmanb 4096 2008-09-14 12:39 CVS
> -rw-rw-r-- 1 chapmanb chapmanb 166 2007-04-23 05:45
> HISTORY
> -rw-rw-r-- 1 chapmanb chapmanb 432 2007-04-23 05:45
> __init__.py
> -rw-rw-r-- 1 chapmanb synbio 29504 2008-09-14 09:15
> KDTree.c
> -rw-rw-r-- 1 chapmanb synbio 689 2008-09-14 12:39
> KDTree.h
> -rw-rw-r-- 1 chapmanb synbio 8165 2008-09-14 12:39
> KDTree.py
> -rw-rw-r-- 1 chapmanb synbio 151 2008-09-14 09:15
> Neighbor.h
>
> Brad
>
> On Sun, 14 Sep 2008 06:23:53 -0700 (PDT), "Michiel de
> Hoon"
> said:
> > Hi everybody,
> >
> > I just committed a bunch of changes to Bio.Cluster,
> Bio.KDTree, and
> > setup.py that deal with the old Numerical Python to
> new NumPy conversion.
> > With these changes, Biopython should compile with
> NumPy; any remaining
> > references to the old Numerical Python are at the
> Python-level only.
> > Since these are rather big changes, please try with
> the current version
> > of CVS to see if everything compiles cleanly and all
> tests pass.
> > Comments, questions, suggestions are welcome.
> >
> > I also uploaded a plain C (instead of C++) version of
> Bio.KDTree, and
> > adjusted setup.py accordingly.
> >
> > --Michiel.
> >
> >
> >
> > _______________________________________________
> > Biopython-dev mailing list
> > Biopython-dev at lists.open-bio.org
> >
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
> --
> Brad Chapman
> chapmanb at 50mail.com
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
From bugzilla-daemon at portal.open-bio.org Tue Sep 16 16:04:20 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 16 Sep 2008 16:04:20 -0400
Subject: [Biopython-dev] [Bug 2583] small bug in NCBIXML.py
In-Reply-To:
Message-ID: <200809162004.m8GK4KKj016559@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2583
------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-09-16 16:04 EST -------
I'd have looked at this earlier but was on holiday. I recall fixing a few
similar issues in the past, but hadn't spotted these. I'll try and deal with
this by the end of the week. Thanks Christen! Peter.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Sep 16 16:07:07 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 16 Sep 2008 16:07:07 -0400
Subject: [Biopython-dev] [Bug 2585] Error in
Bio.SeqUtils.apply_on_multi_fasta
In-Reply-To:
Message-ID: <200809162007.m8GK77S2016681@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2585
------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-09-16 16:07 EST -------
That does look like a bug - I'll have to look over the history to see how this
was originally intended to be used as the current docstring isn't very clear.
Another option would be something like:
results.append(record.format("fasta"))
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Sep 17 05:38:23 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 17 Sep 2008 05:38:23 -0400
Subject: [Biopython-dev] [Bug 2583] small bug in NCBIXML.py
In-Reply-To:
Message-ID: <200809170938.m8H9cN2Y024263@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2583
------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-09-17 05:38 EST -------
Christen - what version of Biopython are you using?
The reason the match/mis-match issue sounded so familiar to me is I fixed it in
Biopython 1.46 after Sebastian Bassi reported it on the mailing list in March.
If you can confirm you are using Biopython 1.45 or older, then could you try
updating you machine? We should then be able to mark this bug as fixed.
Thanks
Peter
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Sep 17 07:17:23 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 17 Sep 2008 07:17:23 -0400
Subject: [Biopython-dev] [Bug 2586] New version of MeltingTemp.py
In-Reply-To:
Message-ID: <200809171117.m8HBHNC9028796@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2586
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-09-17 07:17 EST -------
Updated checked in.
Thanks
Peter
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Sep 17 07:34:34 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 17 Sep 2008 07:34:34 -0400
Subject: [Biopython-dev] [Bug 2585] Error in
Bio.SeqUtils.apply_on_multi_fasta
In-Reply-To:
Message-ID: <200809171134.m8HBYYte029584@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2585
------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-09-17 07:34 EST -------
This bug was introduced in CVS revision 1.13 of Bio/SeqUtils/__init__.py when
moving from Bio.Fasta.RecordParser (which used Fasta record objects with a
title property) to Bio.SeqIO (which uses SeqRecord objects instead). With
hindsight this is a clear oversight (which also changed the usage of the
function). Your fix looks fine for recovering some of the original behaviour.
We should also clarify the docstrings of these (and other functions in this
module) to make it explicit where the "file" argument should be a filename.
However, I am tempted however to deprecate apply_on_multi_fasta and
quicker_apply_on_multi_fasta (and some of the other code here) as to me using a
Bio.SeqIO with a for loop is much clearer.
e.g.
def my_function ...
for record in SeqIO.parse(open(filename), "fasta") :
my_function(record)
versus:
def my_function ...
apply_on_multi_fasta(filename, my_function)
What do you think Sebastian? Did you have a real example for using
apply_on_multi_fasta or did you happen to spot the bug?
Thanks,
Peter
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Sep 17 08:19:40 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 17 Sep 2008 08:19:40 -0400
Subject: [Biopython-dev] [Bug 2585] Error in
Bio.SeqUtils.apply_on_multi_fasta
In-Reply-To:
Message-ID: <200809171219.m8HCJebF031531@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2585
------- Comment #3 from sbassi at gmail.com 2008-09-17 08:19 EST -------
(In reply to comment #2)
> What do you think Sebastian? Did you have a real example for using
> apply_on_multi_fasta or did you happen to spot the bug?
I don't use this function myself and I also think it is redundant.
I spotted it just because I am checking most biopython function for a book on
python for bioinformatics I am writing.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Wed Sep 17 10:12:44 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 17 Sep 2008 10:12:44 -0400
Subject: [Biopython-dev] [Bug 2585] Error in
Bio.SeqUtils.apply_on_multi_fasta
In-Reply-To:
Message-ID: <200809171412.m8HECiP2005746@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2585
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2008-09-17 10:12 EST -------
I've made your suggested fix in CVS, and added a docstring to this and the
related functions. I've described them as obsolete but will also suggest their
deprecation on the mailing list...
Thanks for your report. Peter
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From mjldehoon at yahoo.com Wed Sep 17 10:13:50 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Wed, 17 Sep 2008 07:13:50 -0700 (PDT)
Subject: [Biopython-dev] Numpy conversion
Message-ID: <998217.51422.qm@web62404.mail.re1.yahoo.com>
Hi everybody,
I am now looking at the pure-python modules that make use of Numerical Python / NumPy. Bio.kNN is one of them; this also happens to be the only module that imports Bio.distance, which also depends on NumPy.
What I am not sure about is the usage of Bio.kNN. A quick google search didn't reveal much, suggesting that it is not widely used. Bio.kNN currently is not documented in the tutorial, but the code itself is reasonably well documented.
How do you guys feel about this module? Should we keep it?
--Michiel.
From biopython at maubp.freeserve.co.uk Wed Sep 17 10:23:23 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 17 Sep 2008 15:23:23 +0100
Subject: [Biopython-dev] Cleaning up Bio.SeqUtils
Message-ID: <320fb6e00809170723w95c87b0r94c4b12574ce174f@mail.gmail.com>
Dear all,
I've previously mentioned the idea of cleaning up
Bio/SeqUtils/__init__.py in passing. I've been reminded about this by
Bug 2585 where Sebastian spotted a problem in one of the FASTA related
functions.
http://bugzilla.open-bio.org/show_bug.cgi?id=2585
I've updated the docstrings in CVS to describe the three functions
quick_FASTA_reader, apply_on_multi_fasta and
quicker_apply_on_multi_fasta as obsolete but I would like to suggest
going further and deprecating them.
There are other dubious or redundant functions in
Bio/SeqUtils/__init__.py such as a translate function. Again, would
there be any objection to deprecating this too?
Peter
From biopython at maubp.freeserve.co.uk Wed Sep 17 10:29:35 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 17 Sep 2008 15:29:35 +0100
Subject: [Biopython-dev] Numpy conversion
In-Reply-To: <998217.51422.qm@web62404.mail.re1.yahoo.com>
References: <998217.51422.qm@web62404.mail.re1.yahoo.com>
Message-ID: <320fb6e00809170729g49b97488l629c4132c99b44f0@mail.gmail.com>
On Wed, Sep 17, 2008 at 3:13 PM, Michiel de Hoon wrote:
> Hi everybody,
>
> I am now looking at the pure-python modules that make use of Numerical Python / NumPy.
> Bio.kNN is one of them; this also happens to be the only module that imports Bio.distance,
> which also depends on NumPy.
>
> What I am not sure about is the usage of Bio.kNN. A quick google search didn't reveal much,
> suggesting that it is not widely used. Bio.kNN currently is not documented in the tutorial, but
> the code itself is reasonably well documented.
>
> How do you guys feel about this module? Should we keep it?
>
I've not used it myself, but it sounds handy. Michiel, does this
overlap at all with your clustering module?
Peter
From sbassi at gmail.com Wed Sep 17 18:46:23 2008
From: sbassi at gmail.com (Sebastian Bassi)
Date: Wed, 17 Sep 2008 19:46:23 -0300
Subject: [Biopython-dev] Author of "restriction tutorial"?
Message-ID:
I want to cite in a book the Restriction tutorial
(http://biopython.org/DIST/docs/cookbook/Restriction.html) so I need
author(s) name(s).
I can't find the author name so I ask here to cite it properly.
Best,
SB.
--
Vendo isla: http://www.genesdigitales.com/isla/
Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6
Bioinformatics news: http://www.bioinformatica.info
Tutorial libre de Python: http://tinyurl.com/2az5d5
From bugzilla-daemon at portal.open-bio.org Wed Sep 17 23:15:51 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 17 Sep 2008 23:15:51 -0400
Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows
file-path values
In-Reply-To:
Message-ID: <200809180315.m8I3Fpk9008139@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2480
------- Comment #19 from robert.cadena at gmail.com 2008-09-17 23:15 EST -------
A quick fix to this might be to call a .bat file that calls the blastall
executable. For example:
blastall_wrapper.bat:
"c:/Documents and Settings/maldoror/My Documents/blast/bin/blastall.exe" %1 %2
%3 %4 %5 %6 %7 %8 %9
All arguments containing spaces should be escaped with "\"[arg]". For example:
my_blast_db should be r"\"\\\"c:/documents and settings/maldoror/my
documents/blast/bin/mine\""
When the above value is printed out it should look like:
"\"c:/documents ...."
finally, set my_blastall_exe to the batch file:
"\"c:/documents and settings .../blastall_wrapper.bat\""
You still have to deal with the problem that os.path.exists and os.system
expect the command with and without quotes. but, at least the batch file
wrapper method should pass the arguments properly.
hope it works on your system. best of luck.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython at maubp.freeserve.co.uk Thu Sep 18 05:58:04 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 18 Sep 2008 10:58:04 +0100
Subject: [Biopython-dev] Author of "restriction tutorial"?
In-Reply-To:
References:
Message-ID: <320fb6e00809180258v49f43d4ej7c03551172c9c638@mail.gmail.com>
On Wed, Sep 17, 2008 at 11:46 PM, Sebastian Bassi wrote:
> I want to cite in a book the Restriction tutorial
> (http://biopython.org/DIST/docs/cookbook/Restriction.html) so I need
> author(s) name(s).
> I can't find the author name so I ask here to cite it properly.
> Best,
> SB.
It is a little surprising the author didn't include his name in the
HTML document, but looking at CVS and the mailing list archives, I
think this is by Frederic Sohm.
http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Doc/cookbook/Restriction/?cvsroot=biopython
http://portal.open-bio.org/pipermail/biopython/2005-February/002548.html
(I recall reading an earlier thread where Frederic offered the package
with documentation, but I haven't found it again).
Peter
From bugzilla-daemon at portal.open-bio.org Thu Sep 18 06:58:26 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 18 Sep 2008 06:58:26 -0400
Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows
file-path values
In-Reply-To:
Message-ID: <200809181058.m8IAwQLw001437@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2480
------- Comment #20 from biopython-bugzilla at maubp.freeserve.co.uk 2008-09-18 06:58 EST -------
I think I've fixed some of this in CVS. Biopython should now cope with a blast
exe or input file with spaces in the name - but thus far I have only tested
this on Mac OS X.
See Bio/Blast/NCBIStandalone.py revision 1.77 in CVS. You will be able to look
at the changes and download them via the following URL shortly:
http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Blast/NCBIStandalone.py?cvsroot=biopython
Dealing with database(s) where the path(s) contain spaces is trickier. I think
the best solution here is to setup BLAST so that it knows where to find your
databases, and then you can refer to them by name only (no paths, therefore no
spaces).
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython at maubp.freeserve.co.uk Thu Sep 18 08:15:50 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 18 Sep 2008 13:15:50 +0100
Subject: [Biopython-dev] Modules to be removed from Biopython
In-Reply-To: <320fb6e00808080417y483f74c8xd94dd7ca9eea0476@mail.gmail.com>
References: <492634.64872.qm@web62414.mail.re1.yahoo.com>
<320fb6e00806270950k479eda23ia96d3c2d36557510@mail.gmail.com>
<320fb6e00807160540w325fe995mea400b0014fd7c2e@mail.gmail.com>
<320fb6e00807220913g64613854j7a1deb5b4357f726@mail.gmail.com>
<320fb6e00808080417y483f74c8xd94dd7ca9eea0476@mail.gmail.com>
Message-ID: <320fb6e00809180515g59e53bddoa1d83242df198a1@mail.gmail.com>
I wrote:
>> Bio.expressions was already deprecated, and seems to be a dependency
>> of the following modules, which I have now explicitly deprecated in CVS:
I plan to remove these four deprecated modules shortly, unless anyone objects:
Bio.expressions (deprecated in Biopython 1.44)
Bio.config (explicitly deprecated in Biopython 1.48)
Bio.dbdefs (explicitly deprecated in Biopython 1.48)
Bio.formatdefs (explicitly deprecated in Biopython 1.48)
At the same time I would remove the associated bit of unused code in
Bio/__init__.py
Peter
From mjldehoon at yahoo.com Thu Sep 18 10:10:49 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Thu, 18 Sep 2008 07:10:49 -0700 (PDT)
Subject: [Biopython-dev] Numpy conversion
In-Reply-To: <320fb6e00809170729g49b97488l629c4132c99b44f0@mail.gmail.com>
Message-ID: <37659.57326.qm@web62402.mail.re1.yahoo.com>
> I've not used it myself, but it sounds handy. Michiel,
> does this overlap at all with your clustering module?
No, it doesn't. Bio.Cluster contains unsupervised clustering methods only. The k-nearest neighbors in Bio.kNN is a supervised learning method.
--Michiel.
--- On Wed, 9/17/08, Peter wrote:
> From: Peter
> Subject: Re: [Biopython-dev] Numpy conversion
> To: mjldehoon at yahoo.com
> Cc: biopython-dev at biopython.org
> Date: Wednesday, September 17, 2008, 10:29 AM
> On Wed, Sep 17, 2008 at 3:13 PM, Michiel de Hoon
> wrote:
> > Hi everybody,
> >
> > I am now looking at the pure-python modules that make
> use of Numerical Python / NumPy.
> > Bio.kNN is one of them; this also happens to be the
> only module that imports Bio.distance,
> > which also depends on NumPy.
> >
> > What I am not sure about is the usage of Bio.kNN. A
> quick google search didn't reveal much,
> > suggesting that it is not widely used. Bio.kNN
> currently is not documented in the tutorial, but
> > the code itself is reasonably well documented.
> >
> > How do you guys feel about this module? Should we keep
> it?
> >
>
> I've not used it myself, but it sounds handy. Michiel,
> does this
> overlap at all with your clustering module?
>
> Peter
From biopython at maubp.freeserve.co.uk Thu Sep 18 11:00:16 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 18 Sep 2008 16:00:16 +0100
Subject: [Biopython-dev] test_MarkovModel.py and Numeric/numpy work?
Message-ID: <320fb6e00809180800ic2752dfoe00801f67b57c65c@mail.gmail.com>
Hi all,
I'm back from my one week holiday, and after I updated my machine I'm
seeing a new failure in test_MarkovModel.py is probably related to the
Numeric/numpy work,
$ python run_tests.py test_MarkovModel.py
test_MarkovModel ... ERROR
(output cut)
$python test_MarkovModel.py
TESTING train_visible
Training HMM
Classifying
[(['0', '0', '1', '2', '3', '3'], 0.0082128906250000053)]
STATES: 0 1 2 3
ALPHABET: A C G T
INITIAL:
0: 1.00
1: 0.00
2: 0.00
3: 0.00
TRANSITION:
0: 0.20 0.80 0.00 0.00
1: 0.00 0.50 0.50 0.00
2: 0.00 0.00 0.50 0.50
3: 0.00 0.00 0.00 1.00
EMISSION:
0: 0.67 0.11 0.11 0.11
1: 0.08 0.75 0.08 0.08
2: 0.08 0.08 0.75 0.08
3: 0.03 0.03 0.03 0.91
TESTING baum welch
Training HMM
Traceback (most recent call last):
File "test_MarkovModel.py", line 64, in
p_emission=p_emission
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/Bio/MarkovModel.py",
line 181, in _baum_welch
if not p_initial.any():
AttributeError: any
$ python
Python 2.5.2 (r252:60911, Feb 22 2008, 07:57:53)
[GCC 4.0.1 (Apple Computer, Inc. build 5363)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import Numeric
>>> Numeric.__version__
'24.2'
>>> import numpy
Traceback (most recent call last):
File "", line 1, in
ImportError: No module named numpy
>>>
The above is on Mac OS X 10.5 (Tiger) with Numeric installed, but not
numpy. I see something similar but slightly different on a Linux
machine with both Numeric and an old version of numpy.
Looking at the CVS log, I wonder if this is due to the switch from an
array based or, to an if based manipulation of p_initial, p_transition
and p_emission?
http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/MarkovModel.py.diff?r1=1.3&r2=1.4&cvsroot=biopython
Peter
From bugzilla-daemon at portal.open-bio.org Thu Sep 18 11:21:59 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 18 Sep 2008 11:21:59 -0400
Subject: [Biopython-dev] [Bug 2588] New: tutorial blast section uses
undefined variables
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2588
Summary: tutorial blast section uses undefined variables
Product: Biopython
Version: 1.48
Platform: Other
OS/Version: All
Status: NEW
Severity: trivial
Priority: P2
Component: Documentation
AssignedTo: biopython-dev at biopython.org
ReportedBy: bsouthey at gmail.com
Tutorial Section 6.6.2 'Parsing a file full of BLAST runs' has line:
>>> blast_iterator = NCBIStandalone.Iterator(blast_handle, blast_parser)
but 'blast_handle' is undefined. This line should probably be:
>>> blast_iterator = NCBIStandalone.Iterator(result_handle, blast_parser)
where result_handle is define in Section 6.6.1 'Parsing plain-text BLAST
output':
>>> result_handle = open("my_file_of_blast_output.txt")
Also:
>>> for b_record in b_iterator :
probably should be:
>>> for b_record in blast_iterator :
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From p.j.a.cock at googlemail.com Thu Sep 18 11:25:16 2008
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Thu, 18 Sep 2008 16:25:16 +0100
Subject: [Biopython-dev] Numeric/numpy - Bio/Affy/CelFile.py
Message-ID: <320fb6e00809180825q796a7a1ay7fda222a77de678@mail.gmail.com>
Michiel & Brad,
It was my impression that for the next release of Biopython (or next
few releases?) we would support either numpy or Numeric (decided at
compile time for the C code, but at run time for pure-python modules).
I notice that with CVS revision 1.5 of Bio/Affy/CelFile.py, this file
only uses numpy (dropping support for Numeric).
http://code.open-bio.org/cgi/viewcvs.cgi/biopython/Bio/Affy/CelFile.py.diff?r1=1.4&r2=1.5&cvsroot=biopython
Was this just an oversight, or to resolve some incompatibility between
Numeric and numpy? It would be nice to support both...
Peter
From bugzilla-daemon at portal.open-bio.org Thu Sep 18 11:33:39 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 18 Sep 2008 11:33:39 -0400
Subject: [Biopython-dev] [Bug 2588] tutorial blast section uses undefined
variables
In-Reply-To:
Message-ID: <200809181533.m8IFXde1017191@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2588
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-09-18 11:33 EST -------
Well spotted! I've fixed those in CVS, plus made b_record into blast_record
for consistency with the rest of the chapter.
See biopython/Doc/Tutorial.tex CVS revision 1.159
http://code.open-bio.org/cgi/viewcvs.cgi/biopython/Doc/Tutorial.tex?cvsroot=biopython
Thanks Bruce,
Peter
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Sep 18 17:49:02 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 18 Sep 2008 17:49:02 -0400
Subject: [Biopython-dev] [Bug 2589] New: Errors in running tests in 1.48
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2589
Summary: Errors in running tests in 1.48
Product: Biopython
Version: 1.48
Platform: PC
OS/Version: Linux
Status: NEW
Severity: minor
Priority: P2
Component: Unit Tests
AssignedTo: biopython-dev at biopython.org
ReportedBy: bsouthey at gmail.com
I downloaded BioPython 1.48 on x64 Linux fedora rawhide (kernel
2.6.27-0.329.rc6.git2.fc10.x86_64)
$python setup.py build
$python setup.py test
The test that fails is:
ERROR: test_MarkovModel
----------------------------------------------------------------------
Traceback (most recent call last):
File "run_tests.py", line 152, in runTest
self.runSafeTest()
File "run_tests.py", line 165, in runSafeTest
cur_test = __import__(self.test_name)
File "test_MarkovModel.py", line 61, in
p_emission=p_emission
File
"/home/bsouthey/bioinfo/biopython-1.48/build/lib.linux-x86_64-2.5/Bio/MarkovModel.py",
line 199, in _baum_welch
lpseudo_initial, lpseudo_transition, lpseudo_emission,)
File
"/home/bsouthey/bioinfo/biopython-1.48/build/lib.linux-x86_64-2.5/Bio/MarkovModel.py",
line 255, in _baum_welch_one
lp_initial[:] = lp_arcout_t[:,0]
ValueError: matrices are not aligned for copy
Also these two errors with no explanation because Fdist and SimCoal are not
included as required or optional software. Should be added to the list.
test_PopGen_FDist ... skipping. Fdist not found (not a problem if you do not
intend to use it).
test_PopGen_SimCoal ... skipping. SimCoal not found (not a problem if you do
not intend to use it).
No explanation of what this is:
test_GFF ... skipping. Environment is not configured for this test (not
important if you do not plan to use Bio.GFF).
I know these ones because MySQL is not installed but this test should be
cleaner especially since this is involves optional software:
test_BioSQL ... skipping.
test_BioSQL_SeqIO ... skipping.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Sep 18 18:33:36 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 18 Sep 2008 18:33:36 -0400
Subject: [Biopython-dev] [Bug 2589] Errors in running tests in 1.48
In-Reply-To:
Message-ID: <200809182233.m8IMXaai020481@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2589
------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-09-18 18:33 EST -------
Hi Bruce,
The test_MarkovModel problem looks serious, but improvements to the messages
from skipped tests are also worthwhile.
test_MarkovModel
================
This is interesting, I've not seen this before. What version of Numeric do you
have? You can find out at the python prompt with:
import Numeric
print Numeric.__version__
test_PopGen_FDist and test_PopGen_SimCoal
=========================================
These are 3rd party population genetics tools. Do you think they should be
listed under http://biopython.org/wiki/Download#Optional_Software
test_GFF
========
This unit test requires a GFF wormbase MySQL database to be setup, plus an
environment variable for the password. This is fairly complicated to explain,
hence "Environment is not configured for this test (not important if you do not
plan to use Bio.GFF)."
test_BioSQL and test_BioSQL_SeqIO
=================================
These require a BioSQL database with python driver to be installed plus the
username and password etc to be given in setup_BioSQL.py. What message did you
get exactly, and how would you suggest improving the message given?
Thanks,
Peter
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Sep 18 18:45:00 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 18 Sep 2008 18:45:00 -0400
Subject: [Biopython-dev] [Bug 2589] Errors in running tests in 1.48
In-Reply-To:
Message-ID: <200809182245.m8IMj0NI022825@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2589
------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-09-18 18:45 EST -------
(In reply to comment #1)
> test_PopGen_FDist and test_PopGen_SimCoal
> =========================================
> These are 3rd party population genetics tools. Do you think they should
> be listed under http://biopython.org/wiki/Download#Optional_Software
I've added these on the wiki (and split the list into sections). Is that
better now?
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Sep 18 18:52:27 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 18 Sep 2008 18:52:27 -0400
Subject: [Biopython-dev] [Bug 2251] [PATCH] NumPy support for BioPython
In-Reply-To:
Message-ID: <200809182252.m8IMqRTk024233@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2251
------- Comment #15 from biopython-bugzilla at maubp.freeserve.co.uk 2008-09-18 18:52 EST -------
For those not following the dev-mailing list, Numeric to numpy changes have
begun to be checked into CVS. Brad said he had used Ed's patch for a lot of
this - so thanks Ed!
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Thu Sep 18 23:01:37 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 18 Sep 2008 23:01:37 -0400
Subject: [Biopython-dev] [Bug 2589] Errors in running tests in 1.48
In-Reply-To:
Message-ID: <200809190301.m8J31bmd012377@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2589
------- Comment #3 from bsouthey at gmail.com 2008-09-18 23:01 EST -------
(In reply to comment #1)
> Hi Bruce,
>
> The test_MarkovModel problem looks serious, but improvements to the messages
> from skipped tests are also worthwhile.
>
> test_MarkovModel
> ================
> This is interesting, I've not seen this before. What version of Numeric do you
> have? You can find out at the python prompt with:
>
> import Numeric
> print Numeric.__version__
24.2
Based on a Google search, this is a 64bit problem with Python 2.5 and Numeric.
So either do:
1) Drop the [:] from the left-hand side:
lp_initial = lp_arcout_t[:,0]
2) Do a loop:
for bi in range(lp_initial.shape[0]):
lp_initial[bi] = lp_arcout_t[bi,0]
3) Support NumPy - oh, wait already done... :-)
>
> test_PopGen_FDist and test_PopGen_SimCoal
> =========================================
> These are 3rd party population genetics tools. Do you think they should be
> listed under http://biopython.org/wiki/Download#Optional_Software
Excellent! It is also good promo on what BioPython can do.
>
> test_GFF
> ========
> This unit test requires a GFF wormbase MySQL database to be setup, plus an
> environment variable for the password. This is fairly complicated to explain,
> hence "Environment is not configured for this test (not important if you do not
> plan to use Bio.GFF)."
I did not see where GFF is mentioned so a link would be worthwhile. Also, this
is probably the wrong place for the test or it should not be referenced unless
asked.
>
> test_BioSQL and test_BioSQL_SeqIO
> =================================
> These require a BioSQL database with python driver to be installed plus the
> username and password etc to be given in setup_BioSQL.py. What message did you
> get exactly, and how would you suggest improving the message given?
test_BioSQL ... skipping. Connection failed, check settings in
Tests/setup_BioSQL.py if you plan to use BioSQL: (2002, "Can't connect to local
MySQL server through socket '/var/lib/mysql/mysql.sock' (2)")
ok
test_BioSQL_SeqIO ... skipping. Connection failed, check settings in
Tests/setup_BioSQL.py if you plan to use BioSQL: (2002, "Can't connect to local
MySQL server through socket '/var/lib/mysql/mysql.sock' (2)")
ok
I initially thought to note these in the installation but after looking at the
BioSQL page, these MySQL tests should not be run to test BioPython. These are
BioSQL tests such they should be run after MySQL and BioSQL have been setup. So
these should not be tested unless asked for.
>
> Thanks,
>
> Peter
>
No, thanks to all the developers as this is too minor.
Bruce
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From mjldehoon at yahoo.com Fri Sep 19 09:05:31 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 19 Sep 2008 06:05:31 -0700 (PDT)
Subject: [Biopython-dev] Numeric/numpy - Bio/Affy/CelFile.py
In-Reply-To: <320fb6e00809180825q796a7a1ay7fda222a77de678@mail.gmail.com>
Message-ID: <78171.17163.qm@web62406.mail.re1.yahoo.com>
Actually, I was under the impression that the latest consensus was to go to NumPy directly. It's quite complicated to support both NumPy and Numerical Python, at least at the C level.
--Michiel.
--- On Thu, 9/18/08, Peter Cock wrote:
> From: Peter Cock
> Subject: [Biopython-dev] Numeric/numpy - Bio/Affy/CelFile.py
> To: "BioPython-Dev Mailing List"
> Date: Thursday, September 18, 2008, 11:25 AM
> Michiel & Brad,
>
> It was my impression that for the next release of Biopython
> (or next
> few releases?) we would support either numpy or Numeric
> (decided at
> compile time for the C code, but at run time for
> pure-python modules).
>
> I notice that with CVS revision 1.5 of Bio/Affy/CelFile.py,
> this file
> only uses numpy (dropping support for Numeric).
> http://code.open-bio.org/cgi/viewcvs.cgi/biopython/Bio/Affy/CelFile.py.diff?r1=1.4&r2=1.5&cvsroot=biopython
>
> Was this just an oversight, or to resolve some
> incompatibility between
> Numeric and numpy? It would be nice to support both...
>
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
From p.j.a.cock at googlemail.com Fri Sep 19 09:57:19 2008
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 19 Sep 2008 14:57:19 +0100
Subject: [Biopython-dev] Numeric/numpy - Bio/Affy/CelFile.py
In-Reply-To: <78171.17163.qm@web62406.mail.re1.yahoo.com>
References: <320fb6e00809180825q796a7a1ay7fda222a77de678@mail.gmail.com>
<78171.17163.qm@web62406.mail.re1.yahoo.com>
Message-ID: <320fb6e00809190657j662b8824n6be5ac593c13aaef@mail.gmail.com>
On Fri, Sep 19, 2008 at 2:05 PM, Michiel de Hoon wrote:
> Actually, I was under the impression that the latest consensus was to go to NumPy directly. It's quite complicated to support both NumPy and Numerical Python, at least at the C level.
I was assuming dual support for both numpy or Numeric for the next
release based on code like this:
try:
from Numeric import x, y, z
except ImportError:
from numpy.oldnumeric import x, y, z
where I assumed the C code would have been decided at compile time.
If a simple switch from Numeric to numpy is what you and Brad had in
mind, that's OK with me but in the python code we should just use
simple imports from numpy only.
Peter
From mjldehoon at yahoo.com Fri Sep 19 11:03:47 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 19 Sep 2008 08:03:47 -0700 (PDT)
Subject: [Biopython-dev] Numeric/numpy - Bio/Affy/CelFile.py
In-Reply-To: <320fb6e00809190657j662b8824n6be5ac593c13aaef@mail.gmail.com>
Message-ID: <828325.7957.qm@web62406.mail.re1.yahoo.com>
> try:
> from Numeric import x, y, z
> except ImportError:
> from numpy.oldnumeric import x, y, z
This is the easy part. Keep in mind though that the "from numpy.oldnumeric import x, y, z" approach is only a temporary solution; at some point, the oldnumeric wrapper will disappear from numpy.
> where I assumed the C code would have been decided at
> compile time.
This is the complicated part; it's not just replacing one #include with another. We'd have to use a bunch of #ifdefs to separate the old code from the new code.
Anyway I was planning to go through the Numerical Python - dependent code to see if any other changes are needed. If anybody wants to be able to use the old Numerical Python, please let yourself be heard; otherwise I suggest we go directly to NumPy.
--Michiel
--- On Fri, 9/19/08, Peter Cock wrote:
> From: Peter Cock
> Subject: Re: [Biopython-dev] Numeric/numpy - Bio/Affy/CelFile.py
> To: mjldehoon at yahoo.com
> Cc: "BioPython-Dev Mailing List"
> Date: Friday, September 19, 2008, 9:57 AM
> On Fri, Sep 19, 2008 at 2:05 PM, Michiel de Hoon
> wrote:
> > Actually, I was under the impression that the latest
> consensus was to go to NumPy directly. It's quite
> complicated to support both NumPy and Numerical Python, at
> least at the C level.
>
> I was assuming dual support for both numpy or Numeric for
> the next
> release based on code like this:
>
> try:
> from Numeric import x, y, z
> except ImportError:
> from numpy.oldnumeric import x, y, z
>
> where I assumed the C code would have been decided at
> compile time.
>
> If a simple switch from Numeric to numpy is what you and
> Brad had in
> mind, that's OK with me but in the python code we
> should just use
> simple imports from numpy only.
>
> Peter
From p.j.a.cock at googlemail.com Fri Sep 19 11:42:26 2008
From: p.j.a.cock at googlemail.com (Peter Cock)
Date: Fri, 19 Sep 2008 16:42:26 +0100
Subject: [Biopython-dev] Numeric/numpy
Message-ID: <320fb6e00809190842i6583f7bard82b03d5ea36f51e@mail.gmail.com>
Michiel wrote:
>Peter wrote:
>> I was assuming dual support for both numpy or Numeric for the next
>> release based on code like this:
>>
>> try:
>> from Numeric import x, y, z
>> except ImportError:
>> from numpy.oldnumeric import x, y, z
>
> This is the easy part. Keep in mind though that the "from numpy.oldnumeric import x, y, z" approach is only a temporary solution; at some point, the oldnumeric wrapper will disappear from numpy.
Yes, if/when the oldnumeric wrapper goes away we'll have more work to
do. Something to worry about later.
>> where I assumed the C code would have been decided at
>> compile time.
>
> This is the complicated part; it's not just replacing one #include with another. We'd have to use a bunch of #ifdefs to separate the old code from the new code.
>
> Anyway I was planning to go through the Numerical Python - dependent code to see if any other
> changes are needed. If anybody wants to be able to use the old Numerical Python, please let
> yourself be heard; otherwise I suggest we go directly to NumPy.
>
> --Michiel
That suits me - how about we post something like this on the main
discussion list then?:
Dear all,
As you probably are well aware, Biopython releases to date have used
the now obsolete Numeric python library. This is no longer being
maintained and has been superseded by the numpy library. See
http://www.scipy.org/History_of_SciPy for more about details on the
history of numerical python. Biopython 1.48 should be the last
Numeric only release of Biopython - we have already started moving to
numpy in CVS.
Supporting both Numeric and numpy ought to be fairly straight forward
for the pure python modules in Biopython. However, we also have C code
which must interact with Numeric/numpy, and trying to support both
would be harder.
Would anyone be inconvenienced if the next release of Biopython
supported numpy ONLY (dropping support for Numeric)? If so please
speak up now - either here or on the development mailing list.
Otherwise, a simple switch from Numeric to numpy will probably be the
most straightforward migration plan.
Thank you,
...
From bugzilla-daemon at portal.open-bio.org Fri Sep 19 14:26:17 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 19 Sep 2008 14:26:17 -0400
Subject: [Biopython-dev] [Bug 2591] New: GenBank files misparsed for long
organism names
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2591
Summary: GenBank files misparsed for long organism names
Product: Biopython
Version: 1.47
Platform: PC
OS/Version: Linux
Status: NEW
Severity: normal
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: joelb at lanl.gov
I've noticed a problem with BioPython 1.47 mis-parsing the organism and lineage
in GenBank files from certain bacteria. All of the problem organisms have
names longer than 61 characters, and a line wrap is introduced into the SOURCE
and ORGANISM records, which causes the mis-parsing.
My reading of the GenBank file docs says that lines should be of variable
length rather than being split, so it appears this bug is GenBank's problem
rather than BioPython's. I have sent e-mail to info at ncbi.nlm.nih.gov about the
issue just now. GenBank doesn't seem to have a bug tracker, though, so I'm
writing the issue here to document it for other people. The issue exists for a
number of organisms (more than 6, though I haven't done the exact count).
One example may be found at
ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/Salmonella_enterica_serovar_Paratyphi_A_AKU_12601/NC_011147.gbk
or
http://tinyurl.com/47yg5g
When parsing this file, the taxonomy list returned begins with
["AKU_12601 Bacteria","Proteobacteria"...
Some of the other examples have made it onto web sites which have included the
mis-parsed data, e.g. Superfam
http://supfam.mrc-lmb.cam.ac.uk/SUPERFAMILY/cgi-bin/gen_list.cgi?genome=x6
which shows the error for Salmonella enterica subsp. enterica serovar
Choleraesuis str. SC-B67.
I'll append the response from GenBank to this bug if and when I get one. If I
don't get one, then I'll try to come up with a workaround.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Sep 19 15:05:45 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 19 Sep 2008 15:05:45 -0400
Subject: [Biopython-dev] [Bug 2591] GenBank files misparsed for long
organism names
In-Reply-To:
Message-ID: <200809191905.m8JJ5jUY028741@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2591
------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-09-19 15:05 EST -------
That file starts as follows:
LOCUS NC_011147 4581797 bp DNA circular BCT 29-AUG-2008
DEFINITION Salmonella enterica subsp. enterica serovar Paratyphi A str.
AKU_12601, complete genome.
ACCESSION NC_011147
VERSION NC_011147.1 GI:197361212
KEYWORDS complete genome.
SOURCE Salmonella enterica subsp. enterica serovar Paratyphi A str.
AKU_12601
ORGANISM Salmonella enterica subsp. enterica serovar Paratyphi A str.
AKU_12601
Bacteria; Proteobacteria; Gammaproteobacteria; Enterobacteriales;
Enterobacteriaceae; Salmonella.
REFERENCE 1
...
The multiline DEFINITION and SOURCE should be fine. However, we expect
ORGANISM to be a single line followed by a multiline taxonomy lineage - hense
the problem you observed.
This may well be an NCBI bug but it seems likely this kind of problem will
occur more often in future as more and more (sub)strains of bacteria are
sequenced, requiring longer names.
Let's wait and hear what the NCBI says - I expect they will have to change the
file format definition slightly.
If they say this is a valid file, I hope they will also explain officially how
we should split up the species and its lineage. One option would be some thing
like looking for semi-colons in the following text as indicative of the lineage
(rather than as more of the ORGANISM).
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From chapmanb at 50mail.com Fri Sep 19 18:34:20 2008
From: chapmanb at 50mail.com (Brad Chapman)
Date: Fri, 19 Sep 2008 18:34:20 -0400
Subject: [Biopython-dev] Numeric/numpy
In-Reply-To: <320fb6e00809190842i6583f7bard82b03d5ea36f51e@mail.gmail.com>
References: <320fb6e00809190842i6583f7bard82b03d5ea36f51e@mail.gmail.com>
Message-ID: <20080919223420.GA13009@localdomain>
Peter;
Michiel covered most everything here. My initial check-ins are
basically the try/except you describe and it looks like Michiel has
gone further and worked on real NumPy transitions.
My opinion is to post that message to the main list and move forward
with converting to MumPy exclusively as people are able to tackle
the task for different module. Once something has been converted in
a real way, and not just using oldnumeric imports, then the
try/except can go away. I suspect not too many people will
still be stuck on Numerical, and should be excited to get up to date
with that library.
Brad
> Michiel wrote:
> >Peter wrote:
> >> I was assuming dual support for both numpy or Numeric for the next
> >> release based on code like this:
> >>
> >> try:
> >> from Numeric import x, y, z
> >> except ImportError:
> >> from numpy.oldnumeric import x, y, z
> >
> > This is the easy part. Keep in mind though that the "from numpy.oldnumeric import x, y, z" approach is only a temporary solution; at some point, the oldnumeric wrapper will disappear from numpy.
>
> Yes, if/when the oldnumeric wrapper goes away we'll have more work to
> do. Something to worry about later.
>
> >> where I assumed the C code would have been decided at
> >> compile time.
> >
> > This is the complicated part; it's not just replacing one #include with another. We'd have to use a bunch of #ifdefs to separate the old code from the new code.
> >
> > Anyway I was planning to go through the Numerical Python - dependent code to see if any other
> > changes are needed. If anybody wants to be able to use the old Numerical Python, please let
> > yourself be heard; otherwise I suggest we go directly to NumPy.
> >
> > --Michiel
>
> That suits me - how about we post something like this on the main
> discussion list then?:
>
> Dear all,
>
> As you probably are well aware, Biopython releases to date have used
> the now obsolete Numeric python library. This is no longer being
> maintained and has been superseded by the numpy library. See
> http://www.scipy.org/History_of_SciPy for more about details on the
> history of numerical python. Biopython 1.48 should be the last
> Numeric only release of Biopython - we have already started moving to
> numpy in CVS.
>
> Supporting both Numeric and numpy ought to be fairly straight forward
> for the pure python modules in Biopython. However, we also have C code
> which must interact with Numeric/numpy, and trying to support both
> would be harder.
>
> Would anyone be inconvenienced if the next release of Biopython
> supported numpy ONLY (dropping support for Numeric)? If so please
> speak up now - either here or on the development mailing list.
> Otherwise, a simple switch from Numeric to numpy will probably be the
> most straightforward migration plan.
>
> Thank you,
>
> ...
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
--
Brad Chapman
Codon Devices
http://www.codondevices.com
From mjldehoon at yahoo.com Fri Sep 19 23:00:09 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 19 Sep 2008 20:00:09 -0700 (PDT)
Subject: [Biopython-dev] test_MarkovModel.py and Numeric/numpy work?
In-Reply-To: <320fb6e00809180800ic2752dfoe00801f67b57c65c@mail.gmail.com>
Message-ID: <200283.98234.qm@web62404.mail.re1.yahoo.com>
This is an example where the old Numerical Python and the new NumPy need different code at the Python-level.
The Numerical Python -dependent code was:
p_initial = _safe_copy_and_check(p_initial, (N,)) or _random_norm(N)
Here, _safe_copy_and_check returns an array. NumPy does not allow interpreting an array as a boolean, so instead we have to use:
p_initial = _safe_copy_and_check(p_initial, (N,))
if not p_initial.any():
p_initial = _random_norm(N)
which is the code Brad uploaded to CVS.
However, Numerical Python arrays don't have the .any() method, so this fails with the old Numerical Python.
Let's first see if anybody wants to continue using the old Numerical Python. If so, we can add some try:except: around the call to p_initial.any(). If not, then Brad's code is fine.
--Michiel.
--- On Thu, 9/18/08, Peter wrote:
> From: Peter
> Subject: [Biopython-dev] test_MarkovModel.py and Numeric/numpy work?
> To: "BioPython-Dev Mailing List"
> Date: Thursday, September 18, 2008, 11:00 AM
> Hi all,
>
> I'm back from my one week holiday, and after I updated
> my machine I'm
> seeing a new failure in test_MarkovModel.py is probably
> related to the
> Numeric/numpy work,
>
> $ python run_tests.py test_MarkovModel.py
> test_MarkovModel ... ERROR
> (output cut)
>
> $python test_MarkovModel.py
> TESTING train_visible
> Training HMM
> Classifying
> [(['0', '0', '1', '2',
> '3', '3'], 0.0082128906250000053)]
> STATES: 0 1 2 3
> ALPHABET: A C G T
> INITIAL:
> 0: 1.00
> 1: 0.00
> 2: 0.00
> 3: 0.00
> TRANSITION:
> 0: 0.20 0.80 0.00 0.00
> 1: 0.00 0.50 0.50 0.00
> 2: 0.00 0.00 0.50 0.50
> 3: 0.00 0.00 0.00 1.00
> EMISSION:
> 0: 0.67 0.11 0.11 0.11
> 1: 0.08 0.75 0.08 0.08
> 2: 0.08 0.08 0.75 0.08
> 3: 0.03 0.03 0.03 0.91
> TESTING baum welch
> Training HMM
> Traceback (most recent call last):
> File "test_MarkovModel.py", line 64, in
>
> p_emission=p_emission
> File
> "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/Bio/MarkovModel.py",
> line 181, in _baum_welch
> if not p_initial.any():
> AttributeError: any
>
> $ python
> Python 2.5.2 (r252:60911, Feb 22 2008, 07:57:53)
> [GCC 4.0.1 (Apple Computer, Inc. build 5363)] on darwin
> Type "help", "copyright",
> "credits" or "license" for more
> information.
> >>> import Numeric
> >>> Numeric.__version__
> '24.2'
> >>> import numpy
> Traceback (most recent call last):
> File "", line 1, in
> ImportError: No module named numpy
> >>>
>
> The above is on Mac OS X 10.5 (Tiger) with Numeric
> installed, but not
> numpy. I see something similar but slightly different on a
> Linux
> machine with both Numeric and an old version of numpy.
>
> Looking at the CVS log, I wonder if this is due to the
> switch from an
> array based or, to an if based manipulation of p_initial,
> p_transition
> and p_emission?
> http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/MarkovModel.py.diff?r1=1.3&r2=1.4&cvsroot=biopython
>
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
From mjldehoon at yahoo.com Fri Sep 19 23:01:18 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Fri, 19 Sep 2008 20:01:18 -0700 (PDT)
Subject: [Biopython-dev] Numeric/numpy
In-Reply-To: <320fb6e00809190842i6583f7bard82b03d5ea36f51e@mail.gmail.com>
Message-ID: <415448.81411.qm@web62403.mail.re1.yahoo.com>
OK, I'll send your message to the biopython mailing list.
--Michiel.
--- On Fri, 9/19/08, Peter Cock wrote:
> From: Peter Cock
> Subject: Re: [Biopython-dev] Numeric/numpy
> To: mjldehoon at yahoo.com
> Cc: "BioPython-Dev Mailing List"
> Date: Friday, September 19, 2008, 11:42 AM
> Michiel wrote:
> >Peter wrote:
> >> I was assuming dual support for both numpy or
> Numeric for the next
> >> release based on code like this:
> >>
> >> try:
> >> from Numeric import x, y, z
> >> except ImportError:
> >> from numpy.oldnumeric import x, y, z
> >
> > This is the easy part. Keep in mind though that the
> "from numpy.oldnumeric import x, y, z" approach is
> only a temporary solution; at some point, the oldnumeric
> wrapper will disappear from numpy.
>
> Yes, if/when the oldnumeric wrapper goes away we'll
> have more work to
> do. Something to worry about later.
>
> >> where I assumed the C code would have been decided
> at
> >> compile time.
> >
> > This is the complicated part; it's not just
> replacing one #include with another. We'd have to use a
> bunch of #ifdefs to separate the old code from the new code.
> >
> > Anyway I was planning to go through the Numerical
> Python - dependent code to see if any other
> > changes are needed. If anybody wants to be able to use
> the old Numerical Python, please let
> > yourself be heard; otherwise I suggest we go directly
> to NumPy.
> >
> > --Michiel
>
> That suits me - how about we post something like this on
> the main
> discussion list then?:
>
> Dear all,
>
> As you probably are well aware, Biopython releases to date
> have used
> the now obsolete Numeric python library. This is no longer
> being
> maintained and has been superseded by the numpy library.
> See
> http://www.scipy.org/History_of_SciPy for more about
> details on the
> history of numerical python. Biopython 1.48 should be the
> last
> Numeric only release of Biopython - we have already started
> moving to
> numpy in CVS.
>
> Supporting both Numeric and numpy ought to be fairly
> straight forward
> for the pure python modules in Biopython. However, we also
> have C code
> which must interact with Numeric/numpy, and trying to
> support both
> would be harder.
>
> Would anyone be inconvenienced if the next release of
> Biopython
> supported numpy ONLY (dropping support for Numeric)? If so
> please
> speak up now - either here or on the development mailing
> list.
> Otherwise, a simple switch from Numeric to numpy will
> probably be the
> most straightforward migration plan.
>
> Thank you,
>
> ...
From biopython at maubp.freeserve.co.uk Sat Sep 20 07:31:20 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Sat, 20 Sep 2008 12:31:20 +0100
Subject: [Biopython-dev] Modules to be removed from Biopython
In-Reply-To: <320fb6e00809180515g59e53bddoa1d83242df198a1@mail.gmail.com>
References: <492634.64872.qm@web62414.mail.re1.yahoo.com>
<320fb6e00806270950k479eda23ia96d3c2d36557510@mail.gmail.com>
<320fb6e00807160540w325fe995mea400b0014fd7c2e@mail.gmail.com>
<320fb6e00807220913g64613854j7a1deb5b4357f726@mail.gmail.com>
<320fb6e00808080417y483f74c8xd94dd7ca9eea0476@mail.gmail.com>
<320fb6e00809180515g59e53bddoa1d83242df198a1@mail.gmail.com>
Message-ID: <320fb6e00809200431h2ace4e4dge0cc9835e8d8d53f@mail.gmail.com>
On Thu, Sep 18, 2008 at 1:15 PM, Peter wrote:
> I wrote:
>>> Bio.expressions was already deprecated, and seems to be a dependency
>>> of the following modules, which I have now explicitly deprecated in CVS:
>
> I plan to remove these four deprecated modules shortly, unless anyone objects:
>
> Bio.expressions (deprecated in Biopython 1.44)
> Bio.config (explicitly deprecated in Biopython 1.48)
> Bio.dbdefs (explicitly deprecated in Biopython 1.48)
> Bio.formatdefs (explicitly deprecated in Biopython 1.48)
>
> At the same time I would remove the associated bit of unused code in
> Bio/__init__.py
Done in CVS now.
Peter
From biopython at maubp.freeserve.co.uk Mon Sep 22 09:46:48 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 22 Sep 2008 14:46:48 +0100
Subject: [Biopython-dev] test_MarkovModel.py and Numeric/numpy work?
In-Reply-To: <200283.98234.qm@web62404.mail.re1.yahoo.com>
References: <320fb6e00809180800ic2752dfoe00801f67b57c65c@mail.gmail.com>
<200283.98234.qm@web62404.mail.re1.yahoo.com>
Message-ID: <320fb6e00809220646s1a1ad59dvb83990c69402345e@mail.gmail.com>
On Sat, Sep 20, 2008 at 4:00 AM, Michiel de Hoon wrote:
>
> This is an example where the old Numerical Python and the new NumPy need different code at the Python-level.
> ...
Thanks for the explanation :)
> Let's first see if anybody wants to continue using the old Numerical Python.
> If so, we can add some try:except: around the call to p_initial.any(). If not,
> then Brad's code is fine.
I've added a try/except to stop the failing unit test when Numeric is
installed. This should now work with either Numeric or numpy.
Having the current fall back import system (trying to import Numeric,
falling back on importing numpy) makes sense for transition releases
with support for both. However, if we all agree to do a straight
switch from Numeric to numpy for Biopython 1.49, then I think we
shouldn't try importing from Numeric at all.
Peter
From biopython at maubp.freeserve.co.uk Mon Sep 22 10:32:59 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 22 Sep 2008 15:32:59 +0100
Subject: [Biopython-dev] BOSC 2008 presentation
Message-ID: <320fb6e00809220732x26350cb2ie71051fd15c2770e@mail.gmail.com>
Peter wrote:
>>>
>>> This reminds me that I could/should make a PDF version of the BOSC
>>> 2008 slides to go online here:
>>> http://biopython.org/wiki/Documentation#Presentations
>>>
I've managed to turn the powerpoint version of the Biopython BOSC 2008
talk into a PDF file which is now online. I had to tweak some font
settings (powerpoint on the Mac doesn't show things exactly as it does
on a PC), but this should match up with the version on slideshare. If
anyone spots any mistakes or discrepancies worth fixing, please let me
know.
Peter
From bugzilla-daemon at portal.open-bio.org Mon Sep 22 10:54:35 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 22 Sep 2008 10:54:35 -0400
Subject: [Biopython-dev] [Bug 2592] New: numpy migration for Bio.PDB.Vector
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2592
Summary: numpy migration for Bio.PDB.Vector
Product: Biopython
Version: Not Applicable
Platform: All
OS/Version: All
Status: NEW
Severity: enhancement
Priority: P2
Component: Other
AssignedTo: biopython-dev at biopython.org
ReportedBy: meesters at uni-mainz.de
see http://lists.open-bio.org/pipermail/biopython/2008-September/004505.html
The code is pretty similar to the original one. I don't mind, if it won't be
used.
Vector.py:
from scipy.linalg import det #determinant
from numpy import allclose, arccos, array, cos, dot, eye, float32, matrix, sin,
\
sqrt, sum, trace, transpose, zeros
from math import acos
def m2rotaxis(m):
"""
Return angles, axis pair that corresponds to rotation matrix m.
"""
# Angle always between 0 and pi
# Sense of rotation is defined by axis orientation
t=0.5*(trace(m)-1)
t=max(-1, t)
t=min(1, t)
angle=acos(t)
if angle<1e-15:
# Angle is 0
return 0.0, Vector(1,0,0)
elif anglem11 and m00>m22:
x=sqrt(m00-m11-m22+0.5)
y=m[0,1]/(2*x)
z=m[0,2]/(2*x)
elif m11>m00 and m11>m22:
y=sqrt(m11-m00-m22+0.5)
x=m[0,1]/(2*y)
z=m[1,2]/(2*y)
else:
z=sqrt(m22-m00-m11+0.5)
x=m[0,2]/(2*z)
y=m[1,2]/(2*z)
axis=Vector(x,y,z)
axis.normalize()
return pi, axis
def vector_to_axis(line, point):
"""
Returns the vector between a point and
the closest point on a line (ie. the perpendicular
projection of the point on the line).
@type line: L{Vector}
@param line: vector defining a line
@type point: L{Vector}
@param point: vector defining the point
"""
line=line.normalized()
np=point.norm()
angle=line.angle(point)
return point-line**(np*cos(angle))
def calc_angle(v1, v2, v3):
"""
Calculate the angle between 3 vectors
representing 3 connected points.
@param v1, v2, v3: the tree points that define the angle
@type v1, v2, v3: L{Vector}
@return: angle
@rtype: float
"""
v1=v1-v2
v3=v3-v2
return v1.angle(v3)
def calc_dihedral(v1, v2, v3, v4):
"""
Calculate the dihedral angle between 4 vectors
representing 4 connected points. The angle is in
]-pi, pi].
@param v1, v2, v3, v4: the four points that define the dihedral angle
@type v1, v2, v3, v4: L{Vector}
"""
ab=v1-v2
cb=v3-v2
db=v4-v3
u=ab**cb
v=db**cb
w=u**v
angle=u.angle(v)
# Determine sign of angle
try:
if cb.angle(w)>0.001:
angle=-angle
except ZeroDivisionError:
# dihedral=pi
pass
return angle
def rotaxis(theta, vector):
"""
Calculate a left multiplying rotation matrix that rotates
theta rad around vector.
Example:
>>> m=rotaxis(pi, Vector(1,0,0))
>>> rotated_vector=any_vector.left_multiply(m)
@type theta: float
@param theta: the rotation angle
@type vector: L{Vector}
@param vector: the rotation axis
@return: The rotation matrix, a 3x3 Numeric array.
"""
vector=vector.copy()
vector.normalize()
c=cos(theta)
s=sin(theta)
t=1-c
x,y,z=vector.get_array()
rot=zeros((3,3), "d")
# 1st row
rot[0,0]=t*x*x+c
rot[0,1]=t*x*y-s*z
rot[0,2]=t*x*z+s*y
# 2nd row
rot[1,0]=t*x*y+s*z
rot[1,1]=t*y*y+c
rot[1,2]=t*y*z-s*x
# 3rd row
rot[2,0]=t*x*z-s*y
rot[2,1]=t*y*z+s*x
rot[2,2]=t*z*z+c
return rot
def refmat(p,q):
"""
Return a (left multiplying) matrix that mirrors p onto q.
Example:
>>> mirror=refmat(p,q)
>>> qq=p.left_multiply(mirror)
>>> print q, qq # q and qq should be the same
@type p,q: L{Vector}
@return: The mirror operation, a 3x3 Numeric array.
"""
p.normalize()
q.normalize()
if (p-q).norm()<1e-5:
return eye(3)
pq=p-q
pq.normalize()
b=pq.get_array()
b.shape=(3, 1)
i=eye(3)
ref=i-2* dot(b, transpose(b))
return ref
def rotmat(p,q):
"""
Return a (left multiplying) matrix that rotates p onto q.
Example:
>>> r=rotmat(p,q)
>>> print q, p.left_multiply(r)
@param p: moving vector
@type p: L{Vector}
@param q: fixed vector
@type q: L{Vector}
@return: rotation matrix that rotates p onto q
@rtype: 3x3 Numeric array
"""
rot=refmat(q, -p) * refmat(p, -p).transpose()
return rot
class Vector(object):
"3D vector"
def __init__(self, x, y=None, z=None):
if y is None and z is None:
# Array, list, tuple...
if len(x)!=3:
raise "Vector: x is not a list/tuple/array of 3 numbers"
self._ar=array(x)
else:
# Three numbers
self._ar=array([x, y, z])
def __eq__(self, other):
return allclose(self._ar, other._ar, 0.01)
def __ne__(self, other):
return not self.__eq__(other)
def __repr__(self):
x, y, z = self._ar
return "" % (x, y, z)
def __neg__(self):
"Return Vector(-x, -y, -z)"
return Vector(-self._ar)
def __add__(self, other):
"Return Vector+other Vector or scalar"
if isinstance(other, Vector):
a=self._ar+other._ar
else:
a=self._ar+array(other)
return Vector(a)
def __sub__(self, other):
"Return Vector-other Vector or scalar"
if isinstance(other, Vector):
a=self._ar-other._ar
else:
a=self._ar-array(other)
return Vector(a)
def __mul__(self, other):
"Return Vector.Vector (dot product)"
return sum(self._ar*other._ar)
def __div__(self, x):
"Return Vector(coords/a)"
a=self._ar/array(x)
return Vector(a)
def __pow__(self, other):
"Return VectorxVector (cross product) or Vectorxscalar"
if isinstance(other, Vector):
a,b,c=self._ar
d,e,f=other._ar
c1=det(array(((b,c), (e,f))))
c2=-det(array(((a,c), (d,f))))
c3=det(array(((a,b), (d,e))))
return Vector(c1,c2,c3)
else:
a=self._ar*array(other)
return Vector(a)
def __getitem__(self, i):
return self._ar[i]
def __setitem__(self, i, value):
self._ar[i]=value
def norm(self):
"Return vector norm"
return sqrt(sum(self._ar*self._ar))
def normsq(self):
"Return square of vector norm"
return abs(sum(self._ar*self._ar))
def normalize(self):
"Normalize the Vector"
self._ar=self._ar/self.norm()
def normalized(self):
"Return a normalized copy of the Vector"
v = self.copy()
v.normalize()
return v
def angle(self, other):
"Return angle between two vectors"
n1=self.norm()
n2=other.norm()
c=(self*other)/(n1*n2)
# Take care of roundoff errors
c=min(c,1)
c=max(-1,c)
return arccos(c)
def get_array(self):
"Return (a copy of) the array of coordinates"
return array(self._ar)
def left_multiply(self, matrix):
"Return Vector=Matrix x Vector"
return Vector(dot(matrix, self._ar))
def right_multiply(self, matrix):
"Return Vector=Vector x Matrix"
return Vector(dot(self._ar, matrix))
def copy(self):
"Return a deep copy of the Vector"
return Vector(self._ar)
#xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
test_Vector.py:
import unittest
from math import pi, degrees
from numpy import array, allclose, transpose
from Vector import Vector, calc_angle, calc_dihedral, refmat, rotmat, \
rotaxis, vector_to_axis
class TestVectorFunctions(unittest.TestCase):
"""
Vector-class test functions
"""
def setUp(self):
self.v1 = Vector(0,0,1)
self.v2 = Vector(0,0,0)
self.v3 = Vector(0,1,0)
self.v4 = Vector(1,1,0)
self.v5 = Vector(-1,-1,0)
self.ref_array = array([[ 1.0, 0.0, 0.0],
[ 0.0, 0.0, 1.0],
[ 0.0, 1.0, 0.0]])
self.tolerance = 0.001
def test__eq__(self):
"""Vector.__eq__ should return Boolean for equality check on two
Vectors: testing True"""
self.assert_(self.v1 == self.v1)
def test__eq__2(self):
"""Vector.__eq__ should return Boolean for equality check on two
Vectors: testing False"""
self.failIf(self.v1 == self.v2)
def test__ne__(self):
"""Vector.__ne__ should return Boolean for non-equal Vectors: testing
True"""
self.assert_(self.v1 != self.v2)
def test__ne__2(self):
"""Vector.__ne__ should return Boolean for non-equal Vectors: testing
False"""
self.failIf(self.v1 != self.v1)
def test__repr__(self):
"""Vector.__repr__ should return a Vector-object as a nice string"""
self.assertEqual(repr(self.v1), "")
def test__neg__(self):
"""Vector.__neg__ should Vector-object * -1"""
v = Vector(0,0,-1)
self.assertEqual(-self.v1, v)
def test__add__(self):
"""testing Vector.___add__: Vector + Vector"""
v = Vector(1,1,1)
v2 = Vector(1,1,2)
self.assertEqual(self.v1+v, v2)
def test__add__2(self):
"""testing Vector.___add__: Vector + scalar"""
v = Vector(3,3,4)
self.assertEqual(self.v1+3, v)
def test__add__3(self):
"""testing Vector.___add__: Vector + scalars"""
v = Vector(1,2,4)
self.assertEqual(self.v1+(1,2,3), v)
def test__sub__(self):
"""testing Vector.__sub__(): Vector - Vector"""
self.assertEqual(self.v1-self.v1, self.v2)
def test__sub__2(self):
"""testing Vector.__sub__(): Vector-scalar"""
self.assertEqual(self.v1-1, self.v5)
def test__sub__3(self):
"""testing Vector.__sub__(): Vector-scalars"""
v = Vector(-1,-2,-2)
self.assertEqual(self.v1-(1,2,3), v)
def test__mul__(self):
"""testing Vector.__mul__()"""
self.assertEqual(self.v1 * self.v2, 0)
def test__pow__(self):
"""testing Vector.__pow__()"""
self.assertEqual(self.v1** self.v2, self.v2)
def test__getitem__(self):
"""testing Vector.__getitem__"""
self.assertEqual(self.v1[0], 0)
def test__setitem__(self):
"""testing Vector.__setitem__"""
v = self.v3
v[0] = 1
self.assertEqual(v, self.v4)
def testNorm(self):
"""testing Vector.norm()"""
self.assertEqual(self.v4, self.v4)
def testNormsq(self):
"""testing Vector.normsq()"""
self.assertEqual(self.v4, self.v4)
def testNomalize(self):
"""testing Vector.normalize()"""
self.v4.normalize()
v = Vector(0.71, 0.71, 0.00)
self.assertEqual(self.v4, v)
def testNomalized(self):
"""testing Vector.normalized()"""
self.v4.normalize()
v = Vector(0.71, 0.71, 0.00)
self.assertEqual(self.v4, v)
def testAngle(self):
"""testing Vector.angle()"""
self.assertEqual(degrees(self.v2.angle(self.v1)), 180)
def testGetarray(self):
"""testing Vector.get_array()"""
self.assert_(all(self.v1.get_array() == array((0,0,1))))
def testCopy(self):
"""testing Vector.copy()"""
self.assertEqual(self.v1.copy(), self.v1)
def testCalcangle(self):
"""testing calc_angle()"""
self.assertEqual(degrees(calc_angle(self.v1, self.v2, self.v3)), 90.0)
def testRefmat(self):
"""testing refmat()"""
self.assert_(allclose(refmat(self.v1, self.v3), self.ref_array,
self.tolerance))
def testRotmat(self):
"""testing rotmat()"""
self.assert_(allclose(refmat(self.v1, self.v3), self.ref_array,
self.tolerance))
def testLeftmultiply(self):
"""testing Vector.leftmultiply()"""
self.assertEqual(self.v1.left_multiply(self.ref_array), self.v3)
def testRightmultiply(self):
"""testing Vector.rightmultiply()"""
self.assertEqual(self.v1.right_multiply(transpose(self.ref_array)),
self.v3)
def testRotaxis(self):
"""testing rotaxis()"""
a = array([[ -1.0, 0, 0.0],
[0.0, -1.0, 0.0],
[0.0, 0.0, 1.0]])
self.assert_(allclose(rotaxis(pi, self.v1), a, self.tolerance))
def testVector_to_axis(self):
"""testing vector_to_axis"""
self.assertEqual(vector_to_axis(self.v5, self.v1), self.v1)
def testCalc_dihedral(self):
"""testing calc_dihedral"""
self.assertEqual(degrees(calc_dihedral(self.v1, self.v2, self.v3,
self.v4)), 90)
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython at maubp.freeserve.co.uk Mon Sep 22 13:39:46 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Mon, 22 Sep 2008 18:39:46 +0100
Subject: [Biopython-dev] Modules to be removed from Biopython
In-Reply-To: <320fb6e00809200431h2ace4e4dge0cc9835e8d8d53f@mail.gmail.com>
References: <492634.64872.qm@web62414.mail.re1.yahoo.com>
<320fb6e00806270950k479eda23ia96d3c2d36557510@mail.gmail.com>
<320fb6e00807160540w325fe995mea400b0014fd7c2e@mail.gmail.com>
<320fb6e00807220913g64613854j7a1deb5b4357f726@mail.gmail.com>
<320fb6e00808080417y483f74c8xd94dd7ca9eea0476@mail.gmail.com>
<320fb6e00809180515g59e53bddoa1d83242df198a1@mail.gmail.com>
<320fb6e00809200431h2ace4e4dge0cc9835e8d8d53f@mail.gmail.com>
Message-ID: <320fb6e00809221039j2a1a67fcsda2ffca266f4eea8@mail.gmail.com>
As part of the general Martel/Mindy clean up, I've added a deprecation
warning to Mindy and several other closely related modules, but have
only made a docstring change to Martel. I'm not sure if we should add
a deprecation warning to Martel directly - it would be triggered by
running the Biopython setup.py file which is nasty. Perhaps for this
special case, documentation is enough?
Summary:
* Martel - labelled as deprecated for 1.49, but no explicit warning (see above)
* Bio.Mindy - deprecated for 1.49
* Bio.Std - deprecated for 1.49
* Bio.StdHandler - deprecated for 1.49
* Bio.builders - deprecated for 1.49
* Bio.Decode - deprecated for 1.49
* Bio.Writer (and Bio.writers.*) deprecated in 1.48
* Bio.expressions - deprecated in 1.44, removed for 1.49
* Bio.config - effectively deprecated in 1.44, explicitly in 1.48,
removed for 1.49
* Bio.dbdefs - effectively deprecated in 1.44, explicitly in 1.48,
removed for 1.49
* Bio.formatdefs - effectively deprecated in 1.44, explicitly in 1.48,
removed for 1.49
Open questions:
* Bio.DBXRef - does anyone known what this is for?
* Bio.SGMLExtractor - deprecated in 1.46, ready for removal?
As a bonus once we've moved from CVS to SVN, we should be able to
remove some of the now empty directories in CVS :)
Peter
From mjldehoon at yahoo.com Mon Sep 22 20:55:01 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Mon, 22 Sep 2008 17:55:01 -0700 (PDT)
Subject: [Biopython-dev] Modules to be removed from Biopython
In-Reply-To: <320fb6e00809221039j2a1a67fcsda2ffca266f4eea8@mail.gmail.com>
Message-ID: <553736.99426.qm@web62402.mail.re1.yahoo.com>
The code in setup.py that causes the DeprecationWarning to appear can be fixed relatively easily. Basically, this code was relevant when Martel was also being distributed as a separate module. Nowadays, Biopython contains the latest version of Martel, so there's no reason to check it. The other function of the code in setup.py was to check if importing Martel raises any import errors, in particular for mxTextTools. That we can check by trying to import the dependencies directly, i.e. without going through Martel.
After fixing the code in setup.py, we can add a DeprecationWarning to Bio.Martel.
--Michiel.
--- On Mon, 9/22/08, Peter wrote:
> From: Peter
> Subject: Re: [Biopython-dev] Modules to be removed from Biopython
> To: biopython-dev at biopython.org
> Date: Monday, September 22, 2008, 1:39 PM
> As part of the general Martel/Mindy clean up, I've added
> a deprecation
> warning to Mindy and several other closely related modules,
> but have
> only made a docstring change to Martel. I'm not sure
> if we should add
> a deprecation warning to Martel directly - it would be
> triggered by
> running the Biopython setup.py file which is nasty.
> Perhaps for this
> special case, documentation is enough?
>
> Summary:
>
> * Martel - labelled as deprecated for 1.49, but no explicit
> warning (see above)
> * Bio.Mindy - deprecated for 1.49
> * Bio.Std - deprecated for 1.49
> * Bio.StdHandler - deprecated for 1.49
> * Bio.builders - deprecated for 1.49
> * Bio.Decode - deprecated for 1.49
> * Bio.Writer (and Bio.writers.*) deprecated in 1.48
> * Bio.expressions - deprecated in 1.44, removed for 1.49
> * Bio.config - effectively deprecated in 1.44, explicitly
> in 1.48,
> removed for 1.49
> * Bio.dbdefs - effectively deprecated in 1.44, explicitly
> in 1.48,
> removed for 1.49
> * Bio.formatdefs - effectively deprecated in 1.44,
> explicitly in 1.48,
> removed for 1.49
>
> Open questions:
> * Bio.DBXRef - does anyone known what this is for?
> * Bio.SGMLExtractor - deprecated in 1.46, ready for
> removal?
>
> As a bonus once we've moved from CVS to SVN, we should
> be able to
> remove some of the now empty directories in CVS :)
>
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
From biopython at maubp.freeserve.co.uk Tue Sep 23 05:02:17 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 23 Sep 2008 10:02:17 +0100
Subject: [Biopython-dev] Modules to be removed from Biopython
In-Reply-To: <553736.99426.qm@web62402.mail.re1.yahoo.com>
References: <320fb6e00809221039j2a1a67fcsda2ffca266f4eea8@mail.gmail.com>
<553736.99426.qm@web62402.mail.re1.yahoo.com>
Message-ID: <320fb6e00809230202k507e8ac3m16dc889b245e1c51@mail.gmail.com>
On Tue, Sep 23, 2008 at 1:55 AM, Michiel de Hoon wrote:
> The code in setup.py that causes the DeprecationWarning to
> appear can be fixed relatively easily. Basically, this code was
> relevant when Martel was also being distributed as a separate
> module. Nowadays, Biopython contains the latest version of
> Martel, so there's no reason to check it. The other function of the
> code in setup.py was to check if importing Martel raises any
> import errors, in particular for mxTextTools. That we can check
> by trying to import the dependencies directly, i.e. without going
> through Martel. After fixing the code in setup.py, we can add a
> DeprecationWarning to Bio.Martel.
That sounds positive. I was thinking we might want to edit setup.py
so that it doesn't complain loudly if mxTextTools is missing - given
this will now only be needed for deprecated modules. Do you have any
view on this?
Peter
From biopython at maubp.freeserve.co.uk Tue Sep 23 05:12:32 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 23 Sep 2008 10:12:32 +0100
Subject: [Biopython-dev] Numeric/numpy
In-Reply-To: <415448.81411.qm@web62403.mail.re1.yahoo.com>
References: <320fb6e00809190842i6583f7bard82b03d5ea36f51e@mail.gmail.com>
<415448.81411.qm@web62403.mail.re1.yahoo.com>
Message-ID: <320fb6e00809230212tb5a763cp8fdd58ef90fcc6ba@mail.gmail.com>
I was just thinking about the situation where people have both Numeric
and numpy installed, and that rather than using:
try:
from Numeric import x, y, z
except ImportError:
from numpy.oldnumeric import x, y, z
arguably we should be giving numpy priority. One solution would be
something like this:
try:
from numpy.oldnumeric import x, y, z
except ImportError, e:
try :
from Numeric import x, y, z
except ImportError :
raise e #Want to complain about numpy, not Numeric
Unfortunately this is rather long!
Alternatively, shall we wait until the end of the week (say), and if
no-one objects to a straight switch from Numeric to numpy, proceed
with just the following?:
from numpy.oldnumeric import x, y, z
Peter
From chapmanb at 50mail.com Tue Sep 23 08:08:09 2008
From: chapmanb at 50mail.com (Brad Chapman)
Date: Tue, 23 Sep 2008 08:08:09 -0400
Subject: [Biopython-dev] Modules to be removed from Biopython
In-Reply-To: <320fb6e00809221039j2a1a67fcsda2ffca266f4eea8@mail.gmail.com>
References: <492634.64872.qm@web62414.mail.re1.yahoo.com>
<320fb6e00806270950k479eda23ia96d3c2d36557510@mail.gmail.com>
<320fb6e00807160540w325fe995mea400b0014fd7c2e@mail.gmail.com>
<320fb6e00807220913g64613854j7a1deb5b4357f726@mail.gmail.com>
<320fb6e00808080417y483f74c8xd94dd7ca9eea0476@mail.gmail.com>
<320fb6e00809180515g59e53bddoa1d83242df198a1@mail.gmail.com>
<320fb6e00809200431h2ace4e4dge0cc9835e8d8d53f@mail.gmail.com>
<320fb6e00809221039j2a1a67fcsda2ffca266f4eea8@mail.gmail.com>
Message-ID: <20080923120809.GG13074@localdomain>
Hi Peter;
Thanks for your work cleaning this up.
> Open questions:
> * Bio.DBXRef - does anyone known what this is for?
> * Bio.SGMLExtractor - deprecated in 1.46, ready for removal?
DBXref is associated with all the Martel parsing, so it can be
removed/deprecated as well. It was used in building SeqRecords from
Martel descriptions (Bio.builders.SeqRecord.sequence).
Brad
--
Brad Chapman
Codon Devices
http://www.codondevices.com
From lpritc at scri.ac.uk Tue Sep 23 08:52:38 2008
From: lpritc at scri.ac.uk (Leighton Pritchard)
Date: Tue, 23 Sep 2008 13:52:38 +0100
Subject: [Biopython-dev] Modules to be removed from Biopython
In-Reply-To: <20080923120809.GG13074@localdomain>
Message-ID:
Hi all,
It looks like Bio.DBXRef provides a dictionary of dictionaries that
associate database identifiers from a number of file formats with the
appropriate databases. This sort of thing might be useful to keep around
(i.e. not to have to rebuild from scratch) if there is an intention to
populate the dbxref table with consistent Dbnames for divergent identifiers.
However, Peter appears to have noted in the code for Loader.py that this
behaviour would be inconsistent with the other Bio* projects, and mentions
bug 2405 in that context.
L.
On 23/09/2008 13:08, "Brad Chapman" wrote:
> Hi Peter;
> Thanks for your work cleaning this up.
>
>> Open questions:
>> * Bio.DBXRef - does anyone known what this is for?
>> * Bio.SGMLExtractor - deprecated in 1.46, ready for removal?
>
> DBXref is associated with all the Martel parsing, so it can be
> removed/deprecated as well. It was used in building SeqRecords from
> Martel descriptions (Bio.builders.SeqRecord.sequence).
>
> Brad
--
Dr Leighton Pritchard MRSC
D131, Plant Pathology Programme, SCRI
Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA
e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard
gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405
______________________________________________________________________
SCRI, Invergowrie, Dundee, DD2 5DA.
The Scottish Crop Research Institute is a charitable company limited by
guarantee.
Registered in Scotland No: SC 29367.
Recognised by the Inland Revenue as a Scottish Charity No: SC 006662.
DISCLAIMER:
This email is from the Scottish Crop Research Institute, but the views
expressed by the sender are not necessarily the views of SCRI and its
subsidiaries. This email and any files transmitted with it are
confidential
to the intended recipient at the e-mail address to which it has been
addressed. It may not be disclosed or used by any other than that
addressee.
If you are not the intended recipient you are requested to preserve this
confidentiality and you must not use, disclose, copy, print or rely on
this
e-mail in any way. Please notify postmaster at scri.ac.uk quoting the
name of the sender and delete the email from your system.
Although SCRI has taken reasonable precautions to ensure no viruses are
present in this email, neither the Institute nor the sender accepts any
responsibility for any viruses, and it is your responsibility to scan
the email and the attachments (if any).
______________________________________________________________________
From jblanca at btc.upv.es Tue Sep 23 08:40:24 2008
From: jblanca at btc.upv.es (Jose Blanca)
Date: Tue, 23 Sep 2008 14:40:24 +0200
Subject: [Biopython-dev] about the SeqRecord and SeqFeature classes
Message-ID: <200809231440.24684.jblanca@btc.upv.es>
Hi:
I'm still interested on the design of the Sequence and Alignment classes. For
my work I need sequence classes with some extended features. I need a
SequenceWithQuality class and a Seq class capable of holding information
about features located in different regions of the sequence.
I could use SeqRecord for the sequence with features and extend Seq for the
SequenceWithQuality, but I have found some problems with this approach.
SeqRecord still doesn't have a __getitem__ method. Also, SeqRecord exposes the
implementation of the features collection, it's a public list. That I think
is a limitation. For instance, we could be interested in controlling if a the
feature added is inside the region covered by the sequence. We can't also ask
for features by their name or type.
I understand that keeping compatibility is paramount for BioPython and I share
that concern. I also understand that having two classes to do the same job is
not a nice thing. Nevertheless I have been thinking about these issues and I
have implemented a non-mutable sequence class with these ideas in mind. I
plan to use this implementation to write an Alignment class capable of
dealing with ESTs assemblies.
The most different aspect of this proposal and the code actually alive in
BioPython are the LocatableFeature and Location classes. LocatableFeature is
equivalent to SeqFeature, but while SeqFeature is mostly a struct with no
methods LocatableFeature has a __getitem__, __len__ and complement. Location
is inspired by the BioRange BioPerl class.
I would like to have equivalent functions in BioPython and I'm willing to help
in the adaptation the actual BioPython classes. I would appreciate to hear
your suggestions and criticisms about the classes that I'm sending.
Best regards,
Jose Blanca
P.D. In the tests files there is detailed information about how these classes
would work.
--
Jose M. Blanca Postigo
Instituto Universitario de Conservacion y
Mejora de la Agrodiversidad Valenciana (COMAV)
Universidad Politecnica de Valencia (UPV)
Edificio CPI (Ciudad Politecnica de la Innovacion), 8E
46022 Valencia (SPAIN)
Tlf.:+34-96-3877000 (ext 88473)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: biolib.tar.gz
Type: application/x-tgz
Size: 9873 bytes
Desc: not available
URL:
From mjldehoon at yahoo.com Tue Sep 23 09:58:14 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Tue, 23 Sep 2008 06:58:14 -0700 (PDT)
Subject: [Biopython-dev] Modules to be removed from Biopython
In-Reply-To: <320fb6e00809230202k507e8ac3m16dc889b245e1c51@mail.gmail.com>
Message-ID: <914965.75347.qm@web62403.mail.re1.yahoo.com>
> That sounds positive. I was thinking we might want to edit
> setup.py
> so that it doesn't complain loudly if mxTextTools is
> missing - given
> this will now only be needed for deprecated modules. Do
> you have any
> view on this?
Since mxTextTools is only needed at run time and not at compile time, I think we do not have to check at all if it is present or not. Then in Martel, if importing mxTextTools fails, we can give an informative error message saying that the user should install mxTextTools. Since Martel is deprecated anyway, I think that that is quite sufficient. Compare it to ReportLab: Bio.Graphics imports it, but we don't check in setup.py if it is present or not.
--Michiel
From biopython at maubp.freeserve.co.uk Tue Sep 23 10:37:29 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 23 Sep 2008 15:37:29 +0100
Subject: [Biopython-dev] about the SeqRecord and SeqFeature classes
In-Reply-To: <200809231440.24684.jblanca@btc.upv.es>
References: <200809231440.24684.jblanca@btc.upv.es>
Message-ID: <320fb6e00809230737h223e6e3dgac6bf0fbbf4af41@mail.gmail.com>
On Tue, Sep 23, 2008 at 1:40 PM, Jose Blanca wrote:
> Hi:
> I'm still interested on the design of the Sequence and Alignment classes. For
> my work I need sequence classes with some extended features. I need a
> SequenceWithQuality class and a Seq class capable of holding information
> about features located in different regions of the sequence.
> I could use SeqRecord for the sequence with features and extend Seq for the
> SequenceWithQuality, but I have found some problems with this approach.
I would also like to be able to have SeqRecord or Seq objects with a
quality sequence. This is probably more important than a general "per
letter annotation" system for sequences. Would you want to use
integers, floats or characters for the quality scores?
> SeqRecord still doesn't have a __getitem__ method.
What do you think of the __getitem__ method proposed in attachment 942
on Bug 2507?
http://bugzilla.open-bio.org/show_bug.cgi?id=2507
> Also, SeqRecord exposes the implementation of the features collection,
> it's a public list. That I think is a limitation. For instance, we could be interested
> in controlling if a the feature added is inside the region covered by the sequence.
Yes, because it is currently a public list we can't easily stop the
user putting in-appropriate features (or other objects) into the list.
A list-like sub-class with some brains behind it might be one
backwards compatible approach. But do we really need to worry about
this?
> We can't also ask for features by their name or type.
You can work around this by creating a lookup dictionary, e.g.
http://www.warwick.ac.uk/go/peter_cock/python/genbank/#indexing_features
Perhaps we could add a "lookup feature" function given say an
annotation key (e.g. "locus_tag") and value (e.g. "NEQ010") plus
perhaps feature type (e.g. "CDS").
> I understand that keeping compatibility is paramount for BioPython and I share
> that concern. I also understand that having two classes to do the same job is
> not a nice thing.
I agree. Especially now that Bio.SeqIO and AlignIO seem to be working
out pretty well and these are pretty tied into the SeqRecord object.
> Nevertheless I have been thinking about these issues and I have
> implemented a non-mutable sequence class with these ideas in mind. I
> plan to use this implementation to write an Alignment class capable of
> dealing with ESTs assemblies.
Dealing nicely with EST assemblies is a valuable goal.
> The most different aspect of this proposal and the code actually alive in
> BioPython are the LocatableFeature and Location classes. LocatableFeature is
> equivalent to SeqFeature, but while SeqFeature is mostly a struct with no
> methods LocatableFeature has a __getitem__, __len__ and complement.
> Location is inspired by the BioRange BioPerl class.
I personally don't like the current way Biopython stores the location
for SeqFeatures containing sub-features (e.g. anything with a join).
The join-location can only be determined from a combination of the
location of each sub-feature. However, this standard is currently
implemented and stable, and supported in Biopython's BioSQL wrapper.
> I would like to have equivalent functions in BioPython and I'm willing to help
> in the adaptation the actual BioPython classes. I would appreciate to hear
> your suggestions and criticisms about the classes that I'm sending.
> Best regards,
If there are enough people interested in re-working the
Seq/MutableSeq/SeqRecord objects with an API break, we could seriously
discuss this as part of a hypothetical "Biopython 2.0". Once we move
from CVS to SVN it would also be possible to setup a branch in the
repository to experiment there. However, I think there is still
plenty of potential for improving things in a backwards compatible
manor (and have opened several enhancement bugs on bugzilla for this).
I would like to try and tackle these before breaking the existing
API.
Peter
From biopython at maubp.freeserve.co.uk Tue Sep 23 10:52:14 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 23 Sep 2008 15:52:14 +0100
Subject: [Biopython-dev] Modules to be removed from Biopython
In-Reply-To: <914965.75347.qm@web62403.mail.re1.yahoo.com>
References: <320fb6e00809230202k507e8ac3m16dc889b245e1c51@mail.gmail.com>
<914965.75347.qm@web62403.mail.re1.yahoo.com>
Message-ID: <320fb6e00809230752q579133b1v6f89260b00811e8f@mail.gmail.com>
On Tue, Sep 23, 2008 at 2:58 PM, Michiel de Hoon wrote:
>> That sounds positive. I was thinking we might want to edit
>> setup.py so that it doesn't complain loudly if mxTextTools is
>> missing - given this will now only be needed for deprecated
>> modules. Do you have any view on this?
>
> Since mxTextTools is only needed at run time and not at compile
> time, I think we do not have to check at all if it is present or not.
Agreed - done in CVS.
> Then in Martel, if importing mxTextTools fails, we can give an
> informative error message saying that the user should install
> mxTextTools.
That might be worth exploring - many people will probably be able to
deduce this from an ImportError, but an informative error is more
helpful.
> Since Martel is deprecated anyway, I think that that is quite
> sufficient. Compare it to ReportLab: Bio.Graphics imports it,
> but we don't check in setup.py if it is present or not.
OK.
Peter
From bugzilla-daemon at portal.open-bio.org Tue Sep 23 11:00:21 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 23 Sep 2008 11:00:21 -0400
Subject: [Biopython-dev] [Bug 2507] Adding __getitem__ to SeqRecord for
element access and slicing
In-Reply-To:
Message-ID: <200809231500.m8NF0LeP017539@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2507
------- Comment #10 from biopython-bugzilla at maubp.freeserve.co.uk 2008-09-23 11:00 EST -------
(In reply to comment #2)
> Does this means that SeqRecord would deprecate the .seq attribute? If the .seq
> attribute is not removed slicing could be used in it like: my_seq[1:100] and
> my_seq.seq[1:100].
>
If you had a SeqRecord, record, then yes with this patch you could do:
record[1:100] - gives another SeqRecord with annotation
record.seq[1:100] - gives a Seq object with no annotation
record[1:100].seq - should give an equivalent Seq object with no annotation
Peter
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython at maubp.freeserve.co.uk Tue Sep 23 11:13:28 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 23 Sep 2008 16:13:28 +0100
Subject: [Biopython-dev] Modules to be removed from Biopython
In-Reply-To: <553736.99426.qm@web62402.mail.re1.yahoo.com>
References: <320fb6e00809221039j2a1a67fcsda2ffca266f4eea8@mail.gmail.com>
<553736.99426.qm@web62402.mail.re1.yahoo.com>
Message-ID: <320fb6e00809230813n49c967d8t5f24cad8c9a4009f@mail.gmail.com>
On Tue, Sep 23, 2008 at 1:55 AM, Michiel de Hoon wrote:
> The code in setup.py that causes the DeprecationWarning to appear
> can be fixed relatively easily. Basically, this code was relevant when
> Martel was also being distributed as a separate module. Nowadays,
> Biopython contains the latest version of Martel, so there's no reason
> to check it.
Probably a safe assumption.
> The other function of the code in setup.py was to check if importing
> Martel raises any import errors, in particular for mxTextTools. Then
> we can check by trying to import the dependencies directly, i.e.
> without going through Martel.
As discussed earlier, we've agreed not worry at install time whether
or not mxTextTools is present.
So basically, you recommend we just remove all the Martel special case
code in setup.py, and simply install it automatically like any other
module? I've made this change locally and it seems to be fine. If
this is what you had in mind, I can commit this to CVS too.
> After fixing the code in setup.py, we can add a DeprecationWarning to Bio.Martel.
Agreed.
Peter
From biopython at maubp.freeserve.co.uk Tue Sep 23 11:19:19 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 23 Sep 2008 16:19:19 +0100
Subject: [Biopython-dev] Online access,
Bio.PubMed & Bio.GenBank vs Bio.Entrez
In-Reply-To: <320fb6e00808150928w1feb55d0j25e42c17d7230091@mail.gmail.com>
References: <320fb6e00808150928w1feb55d0j25e42c17d7230091@mail.gmail.com>
Message-ID: <320fb6e00809230819h44b34241t5e8bd15cf5f5043c@mail.gmail.com>
In August 2008 Peter wrote:
> This is a slightly long email covering what to do with the online code
> in Bio.PubMed and Bio.GenBank, and how to make Bio.Entrez easier to
> use. All these modules are essentially wrapping access to the NCBI
> Entrez database via the Entrez Programming Utilities (EUtils).
> http://www.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html
One problem I raised in August with the online parts of Bio.PubMed
and Bio.GenBank was they didn't provide a way to supply the user's
email address. An email address can now be specified via Bio.Entrez
BEFORE calling the Bio.PubMed or Bio.GenBank online functions,
however this is non-obvious.
However, we still have the inherent problem that these simple
functions do not allow use of the NCBI's history feature. (Of course,
for some situations this is never going to apply and therefore isn't
a problem).
> In addition to encouraging the use of Bio.Entrez by documenting it
> prominently in the tutorial, we could go further and deprecate the
> "user friendly" Bio.PubMed and Bio.GenBank wrapper functions.
> What do people think of this? Deprecating the Dictionary classes in
> particular could be a good idea as they use the old fashioned parser
> objects.
In the release notes for Biopython 1.48, I wrote:
>> Bio.PubMed and the online code in Bio.GenBank are now considered
>> obsolete, and we intend to deprecate them after the next release.
>> For accessing PubMed and GenBank, please use Bio.Entrez instead.
Are we agreed on deprecating (some or all of) these bits for Biopython
1.49? I'm happy to put the question to the main mailing list first.
Peter
From biopython at maubp.freeserve.co.uk Tue Sep 23 12:04:54 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Tue, 23 Sep 2008 17:04:54 +0100
Subject: [Biopython-dev] test_ParserSupport.py
Message-ID: <320fb6e00809230904i48ed262fl183822cdc23ac212@mail.gmail.com>
The module Bio/ParserSupport.py provides general code for
scanner/consumer parsers (some of which historically have been written
using Martel). It is used in several pure python parsers (e.g.
Bio.SwissProt). However, Tests/test_ParserSupport.py (its unit test)
did use Martel explicitly for the EventGenerator class, and would fail
if mxTextTools is not installed. I have removed this part of the test
in CVS.
Looking over the codebase with grep, EventGenerator is used in:
* Bio.ECell (deprecated in 1.46)
* Bio.Emboss.Primer [STILL CURRENT]
* Bio.IntelliGenetics (deprecated in 1.48)
* Bio.MetaTool (deprecated in 1.48)
* Bio.NBRF (deprecated in 1.48)
It looks like Bio.Emboss.Primer is the only current bit of code using
Bio.ParserSupport.EventGenerator, so it would be nice to still have
this covered by the unit test. Does anyone fancy re-writing the
EventGenerator part of unit test? I think this could be done by
creating a simple python Scanner object...
Peter
From mjldehoon at yahoo.com Tue Sep 23 19:26:39 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Tue, 23 Sep 2008 16:26:39 -0700 (PDT)
Subject: [Biopython-dev] Modules to be removed from Biopython
In-Reply-To: <320fb6e00809230813n49c967d8t5f24cad8c9a4009f@mail.gmail.com>
Message-ID: <202271.5433.qm@web62402.mail.re1.yahoo.com>
> So basically, you recommend we just remove all the Martel
> special case
> code in setup.py, and simply install it automatically like
> any other
> module? I've made this change locally and it seems to
> be fine. If
> this is what you had in mind, I can commit this to CVS too.
>
Yes, I think that that is a good solution.
--Michiel.
From mjldehoon at yahoo.com Tue Sep 23 19:37:01 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Tue, 23 Sep 2008 16:37:01 -0700 (PDT)
Subject: [Biopython-dev] test_ParserSupport.py
In-Reply-To: <320fb6e00809230904i48ed262fl183822cdc23ac212@mail.gmail.com>
Message-ID: <463161.32038.qm@web62401.mail.re1.yahoo.com>
Bio.Emboss.Primer was also deprecated in 1.48 (replaced by Bio.Emboss.Primer3 and Bio.Emboss.PrimerSearch).
--Michiel.
--- On Tue, 9/23/08, Peter wrote:
> From: Peter
> Subject: [Biopython-dev] test_ParserSupport.py
> To: "BioPython-Dev Mailing List"
> Date: Tuesday, September 23, 2008, 12:04 PM
> The module Bio/ParserSupport.py provides general code for
> scanner/consumer parsers (some of which historically have
> been written
> using Martel). It is used in several pure python parsers
> (e.g.
> Bio.SwissProt). However, Tests/test_ParserSupport.py (its
> unit test)
> did use Martel explicitly for the EventGenerator class, and
> would fail
> if mxTextTools is not installed. I have removed this part
> of the test
> in CVS.
>
> Looking over the codebase with grep, EventGenerator is used
> in:
> * Bio.ECell (deprecated in 1.46)
> * Bio.Emboss.Primer [STILL CURRENT]
> * Bio.IntelliGenetics (deprecated in 1.48)
> * Bio.MetaTool (deprecated in 1.48)
> * Bio.NBRF (deprecated in 1.48)
>
> It looks like Bio.Emboss.Primer is the only current bit of
> code using
> Bio.ParserSupport.EventGenerator, so it would be nice to
> still have
> this covered by the unit test. Does anyone fancy
> re-writing the
> EventGenerator part of unit test? I think this could be
> done by
> creating a simple python Scanner object...
>
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
From biopython at maubp.freeserve.co.uk Wed Sep 24 04:41:41 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 24 Sep 2008 09:41:41 +0100
Subject: [Biopython-dev] Modules to be removed from Biopython
In-Reply-To: <202271.5433.qm@web62402.mail.re1.yahoo.com>
References: <320fb6e00809230813n49c967d8t5f24cad8c9a4009f@mail.gmail.com>
<202271.5433.qm@web62402.mail.re1.yahoo.com>
Message-ID: <320fb6e00809240141k2f3f2507xe58094cd0ddc79d8@mail.gmail.com>
On Wed, Sep 24, 2008 at 12:26 AM, Michiel de Hoon wrote:
>> So basically, you recommend we just remove all the Martel
>> special case code in setup.py, and simply install it
>> automatically like any other module? I've made this
>> change locally and it seems to be fine. If this is what
>> you had in mind, I can commit this to CVS too.
>
> Yes, I think that that is a good solution.
OK, change made in CVS.
Peter
From biopython at maubp.freeserve.co.uk Wed Sep 24 04:53:05 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 24 Sep 2008 09:53:05 +0100
Subject: [Biopython-dev] test_ParserSupport.py
In-Reply-To: <463161.32038.qm@web62401.mail.re1.yahoo.com>
References: <320fb6e00809230904i48ed262fl183822cdc23ac212@mail.gmail.com>
<463161.32038.qm@web62401.mail.re1.yahoo.com>
Message-ID: <320fb6e00809240153o6fd697acw9a6a7b6b952e0c3d@mail.gmail.com>
On Wed, Sep 24, 2008 at 12:37 AM, Michiel de Hoon wrote:
>
> Bio.Emboss.Primer was also deprecated in 1.48 (replaced by Bio.Emboss.Primer3 and Bio.Emboss.PrimerSearch).
Thanks - I clearly didn't check carefully enough, I just read the CVS
comments. I've made a slight revision to Bio.Emboss.Primer in CVS to
expand the module docstring (and say it is deprecated) plus moved the
deprecation warning above the Martel import (which would fail if
mxTextTools wasn't installed - meaning the deprecation warning wasn't
shown).
I *think* this means Bio.ParserSupport.EventGenerator is now only
being used in deprecated modules, so the lack of a unit test covering
this is less important.
Peter
From bugzilla-daemon at portal.open-bio.org Wed Sep 24 06:44:15 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 24 Sep 2008 06:44:15 -0400
Subject: [Biopython-dev] [Bug 2489] KDTree NN search without specifying
radius
In-Reply-To:
Message-ID: <200809241044.m8OAiF3w009359@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2489
------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2008-09-24 06:44 EST -------
These C++ suggestions are now obsolete, as the C++ part of Bio.KDTree has been
re-written in plain C (in CVS after the release of Biopython 1.48). This was
to simplify the build process as the C++ code had problems on some platforms.
Making the radius optional in KDTree searches is still a potentially useful
enhancement...
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython at maubp.freeserve.co.uk Wed Sep 24 12:58:21 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 24 Sep 2008 17:58:21 +0100
Subject: [Biopython-dev] Versions of numpy/Numeric
Message-ID: <320fb6e00809240958x37aa1e97ka2a569e311e2756b@mail.gmail.com>
Hi again,
I was wondering what versions of numpy and Numeric have been tested
with Biopython CVS? For anyone who didn't know, you can check at the
python prompt with:
import numpy
print numpy.version.version
and,
import Numeric
print Numeric.__version__
Using CVS Biopython compiled from source, the unit tests all seem fine
on the following three setups:
Mac OS X, python 2.5.2, Numeric 24.2 and numpy 1.1.1
Test suite looks fine
Linux, python 2.5, Numeric 24.2 and numpy 1.0
Fine, ignoring the Numeric eigenvalue problem in
test_SVDSuperimposer.py previously discussed
Linux, python 2.3, numpy 1.1.1 [no Numeric]
Fine, after fixing some broken imports which were using recent python
syntax, and reducing the number of decimal places used in
test_SVDSuperimposer.py (numpy and Numeric give very slightly
different answers).
Note that testing where there is NO version of Numeric is important
(as in this third example), as if both numpy and Numeric are installed
currently most of the pure python modules will use Numeric by choice.
Also note that running the test suite via run_tests.py will hide any
deprecation warnings from numpy - I tried running
test_SVDSuperimposer.py on its own and got:
/home/xxx/lib/python2.3/site-packages/numpy/lib/utils.py:114:
DeprecationWarning: ('matrixmultiply is deprecated, use dot',)
I've now updated Bio/SVDSuperimposer/SVDSuperimposer.py to use dot
instead of matrixmultiply (this works on both numpy and Numeric).
Peter
From bsouthey at gmail.com Wed Sep 24 14:19:42 2008
From: bsouthey at gmail.com (Bruce Southey)
Date: Wed, 24 Sep 2008 13:19:42 -0500
Subject: [Biopython-dev] Versions of numpy/Numeric
In-Reply-To: <320fb6e00809240958x37aa1e97ka2a569e311e2756b@mail.gmail.com>
References: <320fb6e00809240958x37aa1e97ka2a569e311e2756b@mail.gmail.com>
Message-ID: <48DA84BE.7080105@gmail.com>
Peter wrote:
> Hi again,
>
> I was wondering what versions of numpy and Numeric have been tested
> with Biopython CVS? For anyone who didn't know, you can check at the
> python prompt with:
>
> import numpy
> print numpy.version.version
>
Actually just do
numpy.__version__
Currently numpy 1.2 is at the second release candidate stage. Note that
this version requires Python 2.4 and uses the nose testing framework
version 0.10 or later for testing.
Somewhat related to this, what is the appropriate way to find the
version of BioPython installed within Python?
Bruce
From bsouthey at gmail.com Wed Sep 24 15:22:35 2008
From: bsouthey at gmail.com (Bruce Southey)
Date: Wed, 24 Sep 2008 14:22:35 -0500
Subject: [Biopython-dev] Versions of numpy/Numeric
In-Reply-To: <320fb6e00809240958x37aa1e97ka2a569e311e2756b@mail.gmail.com>
References: <320fb6e00809240958x37aa1e97ka2a569e311e2756b@mail.gmail.com>
Message-ID: <48DA937B.8000901@gmail.com>
Peter wrote:
> Hi again,
>
> I was wondering what versions of numpy and Numeric have been tested
> with Biopython CVS? For anyone who didn't know, you can check at the
> python prompt with:
>
> import numpy
> print numpy.version.version
>
> and,
>
> import Numeric
> print Numeric.__version__
>
> Using CVS Biopython compiled from source, the unit tests all seem fine
> on the following three setups:
>
> Mac OS X, python 2.5.2, Numeric 24.2 and numpy 1.1.1
> Test suite looks fine
>
> Linux, python 2.5, Numeric 24.2 and numpy 1.0
> Fine, ignoring the Numeric eigenvalue problem in
> test_SVDSuperimposer.py previously discussed
>
> Linux, python 2.3, numpy 1.1.1 [no Numeric]
> Fine, after fixing some broken imports which were using recent python
> syntax, and reducing the number of decimal places used in
> test_SVDSuperimposer.py (numpy and Numeric give very slightly
> different answers).
>
> Note that testing where there is NO version of Numeric is important
> (as in this third example), as if both numpy and Numeric are installed
> currently most of the pure python modules will use Numeric by choice.
>
> Also note that running the test suite via run_tests.py will hide any
> deprecation warnings from numpy - I tried running
> test_SVDSuperimposer.py on its own and got:
> /home/xxx/lib/python2.3/site-packages/numpy/lib/utils.py:114:
> DeprecationWarning: ('matrixmultiply is deprecated, use dot',)
> I've now updated Bio/SVDSuperimposer/SVDSuperimposer.py to use dot
> instead of matrixmultiply (this works on both numpy and Numeric).
>
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
>
Hi,
With Numeric 24.2 and Numpy 1.2rc2 installed on linux 64bit and Python
2.5.1.
python python setup.py test gives several deprecation warnings from
test_Cluster and test_KDTree but still pass.
test_MarkovModel fails as I found with BioPython 1.48 (Bug 2589). This
is most likely a 64-bit thing with Python 2.5.
ERROR: test_MarkovModel
----------------------------------------------------------------------
Traceback (most recent call last):
File "run_tests.py", line 152, in runTest
self.runSafeTest()
File "run_tests.py", line 165, in runSafeTest
cur_test = __import__(self.test_name)
File "test_MarkovModel.py", line 65, in
p_emission=p_emission
File
"/home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.5/Bio/MarkovModel.py",
line 220, in _baum_welch
lpseudo_initial, lpseudo_transition,
lpseudo_emission,)
File
"/home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.5/Bio/MarkovModel.py",
line 276, in _baum_welch_one
lp_initial[:] = lp_arcout_t[:,0]
I did notice that both test_MarkovModel.py and test_SVDSuperimposer.py
have first try to import Numeric - as does MarkovModel.py. However this
same bug is still likely since numpy.oldnumeric is used.
Regards
Bruce
From biopython at maubp.freeserve.co.uk Wed Sep 24 16:37:43 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 24 Sep 2008 21:37:43 +0100
Subject: [Biopython-dev] Versions of numpy/Numeric
In-Reply-To: <48DA937B.8000901@gmail.com>
References: <320fb6e00809240958x37aa1e97ka2a569e311e2756b@mail.gmail.com>
<48DA937B.8000901@gmail.com>
Message-ID: <320fb6e00809241337r2e2edbdeh27b5a793832f762b@mail.gmail.com>
> Hi,
> With Numeric 24.2 and Numpy 1.2rc2 installed on linux 64bit and Python
> 2.5.1.
I don't currently have access to a machine with that setup - so this
is very useful. Thanks!
> python python setup.py test gives several deprecation warnings from
> test_Cluster and test_KDTree but still pass.
Iteresting - these may be deprecations for Numpy 1.2 - if you have the
output handy could you share it? If its very long, you can just send
it to me off the list (or file a bug with the details).
> test_MarkovModel fails as I found with BioPython 1.48 (Bug 2589). This is
> most likely a 64-bit thing with Python 2.5.
>
> ERROR: test_MarkovModel
> ----------------------------------------------------------------------
> Traceback (most recent call last):
> ...
> line 276, in _baum_welch_one
> lp_initial[:] = lp_arcout_t[:,0]
>
> I did notice that both test_MarkovModel.py and test_SVDSuperimposer.py have
> first try to import Numeric - as does MarkovModel.py.
Yeah - I'd raised that on the dev list earlier. Depending on how we
do this, we'll could end up with misleading import failures (an error
about Numeric when we really care about numpy). If however no-one
objects to completely dropping Numeric for the next release, things
become much simpler.
> However this same bug is still likely since numpy.oldnumeric is used.
If you have both Numeric and numpy installed, this module is probably
using Numeric and thus still fails. Could you try flipping the
imports round in .../Bio/MarkovModel.py to see if this problem goes
away (i.e. make sure it uses numpy instead of Numeric)?
If the problem is still there, would you mind also trying the work
around you suggested on Bug 2589 please (dropping the [:] from the
left-hand side)? If that works for you on both numpy and Numeric it
seems a worthwhile change for CVS.
Thanks
Peter
From biopython at maubp.freeserve.co.uk Wed Sep 24 17:12:24 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 24 Sep 2008 22:12:24 +0100
Subject: [Biopython-dev] determining the version
Message-ID: <320fb6e00809241412r54c2a3a1mc69f3e573f1eaac7@mail.gmail.com>
>> I was wondering what versions of numpy and Numeric have been tested
>> with Biopython CVS? For anyone who didn't know, you can check at the
>> python prompt with:
>>
>> import numpy
>> print numpy.version.version
>>
>
> Actually just do
> numpy.__version__
That is nicer :)
> Somewhat related to this, what is the appropriate way to find the version of
> BioPython installed within Python?
So I'm not the only person to have wondered about this. For now, I
can only suggest an ugly workarround:
import Martel
print Martel.__version__
Since Biopython 1.45, by convention the Martel version has been
incremented to match that of Biopython. Of course, in a few releases
time we probably won't be including Martel any more.
Perhaps we should add a __version__ to Bio/__init__.py for future
releases, with the release "script" modified to ensure this gets
incremented to match that used in setup.py (and Martel).
Peter
From bsouthey at gmail.com Wed Sep 24 17:52:04 2008
From: bsouthey at gmail.com (Bruce Southey)
Date: Wed, 24 Sep 2008 16:52:04 -0500
Subject: [Biopython-dev] Versions of numpy/Numeric
In-Reply-To: <320fb6e00809241337r2e2edbdeh27b5a793832f762b@mail.gmail.com>
References: <320fb6e00809240958x37aa1e97ka2a569e311e2756b@mail.gmail.com>
<48DA937B.8000901@gmail.com>
<320fb6e00809241337r2e2edbdeh27b5a793832f762b@mail.gmail.com>
Message-ID: <48DAB684.4030103@gmail.com>
Peter wrote:
>> Hi,
>> With Numeric 24.2 and Numpy 1.2rc2 installed on linux 64bit and Python
>> 2.5.1.
>>
>
> I don't currently have access to a machine with that setup - so this
> is very useful. Thanks!
>
>
>> python python setup.py test gives several deprecation warnings from
>> test_Cluster and test_KDTree but still pass.
>>
>
> Iteresting - these may be deprecations for Numpy 1.2 - if you have the
> output handy could you share it? If its very long, you can just send
> it to me off the list (or file a bug with the details).
>
The not very informative list says it also ''includes a few some minor
API breakage first scheduled in the 1.1 release"
http://scipy.org/scipy/numpy/milestone/1.2.0
But this is one of them! There is no message in the numpy 1.1 code.
There is an email on Jul 13, 2008
(http://www.nabble.com/Newly-deprecated-API-functions-td18436792.html)
that notes these are depreciated and will be scheduled removed for Numpy
1.3. So rather annoying!
Sorry, I did not realize until I looked that these were 1.2 related. So
the relevant output is:
test_Cluster ... test_Cluster.py:584: DeprecationWarning:
PyArray_FromDims: use
PyArray_SimpleNew.
matrix = distancematrix(data, mask=mask,
weight=weight)
test_Cluster.py:584: DeprecationWarning:
PyArray_FromDimsAndDataAndDescr: use
PyArray_NewFromDescr.
matrix = distancematrix(data, mask=mask,
weight=weight)
test_Cluster.py:629: DeprecationWarning: PyArray_FromDims: use
PyArray_SimpleNew.
clusterid, error, nfound = kmedoids(matrix,
npass=1000)
test_Cluster.py:629: DeprecationWarning:
PyArray_FromDimsAndDataAndDescr: use
PyArray_NewFromDescr.
clusterid, error, nfound = kmedoids(matrix,
npass=1000)
test_Cluster.py:129: DeprecationWarning: PyArray_FromDims: use
PyArray_SimpleNew.
clusterid, error, nfound = kcluster(data, nclusters=nclusters,
mask=mask, weight=weight, transpose=0, npass=100, method='a',
dist='e')
test_Cluster.py:129: DeprecationWarning:
PyArray_FromDimsAndDataAndDescr: use
PyArray_NewFromDescr.
clusterid, error, nfound = kcluster(data, nclusters=nclusters,
mask=mask, weight=weight, transpose=0, npass=100, method='a',
dist='e')
test_Cluster.py:166: DeprecationWarning: PyArray_FromDims: use
PyArray_SimpleNew.
clusterid, error, nfound = kcluster(data, nclusters=3, mask=mask,
weight=weight, transpose=0, npass=100, method='a',
dist='e')
test_Cluster.py:166: DeprecationWarning:
PyArray_FromDimsAndDataAndDescr: use
PyArray_NewFromDescr.
clusterid, error, nfound = kcluster(data, nclusters=3, mask=mask,
weight=weight, transpose=0, npass=100, method='a',
dist='e')
test_Cluster.py:522: DeprecationWarning: PyArray_FromDims: use
PyArray_SimpleNew.
clusterid, celldata = somcluster(data=data, mask=mask, weight=weight,
transpose=0, nxgrid=10, nygrid=10, inittau=0.02, niter=100,
dist='e')
test_Cluster.py:522: DeprecationWarning:
PyArray_FromDimsAndDataAndDescr: use
PyArray_NewFromDescr.
clusterid, celldata = somcluster(data=data, mask=mask, weight=weight,
transpose=0, nxgrid=10, nygrid=10, inittau=0.02, niter=100,
dist='e')
test_Cluster.py:555: DeprecationWarning: PyArray_FromDims: use
PyArray_SimpleNew.
clusterid, celldata = somcluster(data=data, mask=mask, weight=weight,
transpose=0, nxgrid=10, nygrid=10, inittau=0.02, niter=100,
dist='e')
test_Cluster.py:555: DeprecationWarning:
PyArray_FromDimsAndDataAndDescr: use
PyArray_NewFromDescr.
clusterid, celldata = somcluster(data=data, mask=mask, weight=weight,
transpose=0, nxgrid=10, nygrid=10, inittau=0.02, niter=100,
dist='e')
ok
test_KDTree ...
/home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.5/Bio/KDTree/KDTree.py:71:
DeprecationWarning: PyArray_FromDims: use PyArray_SimpleNew.
r=kdt.get_indices()
/home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.5/Bio/KDTree/KDTree.py:71:
DeprecationWarning: PyArray_FromDimsAndDataAndDescr: use
PyArray_NewFromDescr.
r=kdt.get_indices()
ok
>
>> test_MarkovModel fails as I found with BioPython 1.48 (Bug 2589). This is
>> most likely a 64-bit thing with Python 2.5.
>>
>> ERROR: test_MarkovModel
>> ----------------------------------------------------------------------
>> Traceback (most recent call last):
>> ...
>> line 276, in _baum_welch_one
>> lp_initial[:] = lp_arcout_t[:,0]
>>
>> I did notice that both test_MarkovModel.py and test_SVDSuperimposer.py have
>> first try to import Numeric - as does MarkovModel.py.
>>
>
> Yeah - I'd raised that on the dev list earlier. Depending on how we
> do this, we'll could end up with misleading import failures (an error
> about Numeric when we really care about numpy). If however no-one
> objects to completely dropping Numeric for the next release, things
> become much simpler.
>
>
>> However this same bug is still likely since numpy.oldnumeric is used.
>>
>
> If you have both Numeric and numpy installed, this module is probably
> using Numeric and thus still fails. Could you try flipping the
> imports round in .../Bio/MarkovModel.py to see if this problem goes
> away (i.e. make sure it uses numpy instead of Numeric)?
>
> If the problem is still there, would you mind also trying the work
> around you suggested on Bug 2589 please (dropping the [:] from the
> left-hand side)? If that works for you on both numpy and Numeric it
> seems a worthwhile change for CVS.
>
> Thanks
>
> Peter
>
Luckily MarkovModel.py is almost self-contained so I used it
independently of the installation. The test passes as is with numpy and
if I drop the [:] it passes with both Numeric and numpy import statements.
Bruce
From bugzilla-daemon at portal.open-bio.org Wed Sep 24 18:24:02 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Wed, 24 Sep 2008 18:24:02 -0400
Subject: [Biopython-dev] [Bug 2589] Errors in running tests in 1.48
In-Reply-To:
Message-ID: <200809242224.m8OMO2R2019335@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2589
------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2008-09-24 18:24 EST -------
(In reply to comment #3)
> > test_MarkovModel
> > ================
>
> 24.2
>
> Based on a Google search, this is a 64bit problem with Python 2.5 and Numeric.
>
> So either do:
> 1) Drop the [:] from the left-hand side:
> lp_initial = lp_arcout_t[:,0]
Over on the mailing list, Bruce reported this fix works for both numpy and
Numeric. I've now checked this into CVS, MarkovModel.py revision 1.6.
Thanks Bruce!
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython at maubp.freeserve.co.uk Wed Sep 24 18:24:55 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Wed, 24 Sep 2008 23:24:55 +0100
Subject: [Biopython-dev] Versions of numpy/Numeric
In-Reply-To: <48DAB684.4030103@gmail.com>
References: <320fb6e00809240958x37aa1e97ka2a569e311e2756b@mail.gmail.com>
<48DA937B.8000901@gmail.com>
<320fb6e00809241337r2e2edbdeh27b5a793832f762b@mail.gmail.com>
<48DAB684.4030103@gmail.com>
Message-ID: <320fb6e00809241524r48ee004xfff68ff5a58ef736@mail.gmail.com>
> The not very informative list says it also ''includes a few some minor API
> breakage first scheduled in the 1.1 release"
> http://scipy.org/scipy/numpy/milestone/1.2.0
>
> But this is one of them! There is no message in the numpy 1.1 code. There is
> an email on Jul 13, 2008
> (http://www.nabble.com/Newly-deprecated-API-functions-td18436792.html) that
> notes these are depreciated and will be scheduled removed for Numpy 1.3. So
> rather annoying!
Yes - this does seem annoying :(
> Sorry, I did not realize until I looked that these were 1.2 related. So the
> relevant output is:
So in summary, the warnings from numpy 1.2 were multiple cases of the
following where the old functions will be removed in numpy 1.3:
PyArray_FromDims to PyArray_SimpleNew.
PyArray_FromDimsAndDataAndDescr to PyArray_NewFromDescr
Bio.Cluster will therefore need updating at some point - the next
question is when were PyArray_SimpleNew and PyArray_NewFromDescr
introduced...
>>> test_MarkovModel fails as I found with BioPython 1.48 (Bug 2589). This is
>>> most likely a 64-bit thing with Python 2.5.
>
> Luckily MarkovModel.py is almost self-contained so I used it independently
> of the installation. The test passes as is with numpy and if I drop the [:]
> it passes with both Numeric and numpy import statements.
Great - I've checked that into CVS now.
Thanks,
Peter
From bsouthey at gmail.com Wed Sep 24 21:09:39 2008
From: bsouthey at gmail.com (Bruce Southey)
Date: Wed, 24 Sep 2008 20:09:39 -0500
Subject: [Biopython-dev] Versions of numpy/Numeric
In-Reply-To: <320fb6e00809241524r48ee004xfff68ff5a58ef736@mail.gmail.com>
References: <320fb6e00809240958x37aa1e97ka2a569e311e2756b@mail.gmail.com>
<48DA937B.8000901@gmail.com>
<320fb6e00809241337r2e2edbdeh27b5a793832f762b@mail.gmail.com>
<48DAB684.4030103@gmail.com>
<320fb6e00809241524r48ee004xfff68ff5a58ef736@mail.gmail.com>
Message-ID:
On Wed, Sep 24, 2008 at 5:24 PM, Peter wrote:
>> The not very informative list says it also ''includes a few some minor API
>> breakage first scheduled in the 1.1 release"
>> http://scipy.org/scipy/numpy/milestone/1.2.0
>>
>> But this is one of them! There is no message in the numpy 1.1 code. There is
>> an email on Jul 13, 2008
>> (http://www.nabble.com/Newly-deprecated-API-functions-td18436792.html) that
>> notes these are depreciated and will be scheduled removed for Numpy 1.3. So
>> rather annoying!
>
> Yes - this does seem annoying :(
>
>> Sorry, I did not realize until I looked that these were 1.2 related. So the
>> relevant output is:
>
> So in summary, the warnings from numpy 1.2 were multiple cases of the
> following where the old functions will be removed in numpy 1.3:
>
> PyArray_FromDims to PyArray_SimpleNew.
> PyArray_FromDimsAndDataAndDescr to PyArray_NewFromDescr
>
> Bio.Cluster will therefore need updating at some point - the next
> question is when were PyArray_SimpleNew and PyArray_NewFromDescr
> introduced...
>
I have never used the C-API so I looked at what I have available. The
earliest numpy code I have is 0.9.6 and these are defined in the
header file:
numpy/core/include/numpy/arrayobject.h
These are mentioned in the file numpy/doc/CAPI.txt ('Created:
October 2005') present at least in the versions from numpy-1.0.1 to
numpy-1.1.1 (I don't see it in the release candidate tarball) :
"``PyArray_SimpleNew`` is just a macro for ``PyArray_New`` with
default arguments.
Use ``PyArray_FILLWBYTE(arr, 0)`` to fill with zeros.
The ``PyArray_FromDims`` and family of functions are still available and
are loose wrappers around this function. These functions still take
``int *`` arguments. This should be fine on 32-bit systems, but on 64-bit
systems you may run into trouble if you frequently passed
``PyArray_FromDims`` the dimensions member of the old
``PyArrayObject`` structure
because ``sizeof(npy_intp) != sizeof(int)``.
"
Regards
Bruce
From mjldehoon at yahoo.com Thu Sep 25 04:47:10 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Thu, 25 Sep 2008 01:47:10 -0700 (PDT)
Subject: [Biopython-dev] determining the version
In-Reply-To: <320fb6e00809241412r54c2a3a1mc69f3e573f1eaac7@mail.gmail.com>
Message-ID: <63700.34226.qm@web62405.mail.re1.yahoo.com>
> Perhaps we should add a __version__ to Bio/__init__.py for
> future releases, with the release "script" modified to
> ensure this gets incremented to match that used in
> setup.py (and Martel).
Another solution is that setup.py uses (reads or imports) __init__.py to find out what the version is. For example, this is what matplotlib does in its setup.py script.
--Michiel.
From biopython at maubp.freeserve.co.uk Thu Sep 25 05:22:56 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 25 Sep 2008 10:22:56 +0100
Subject: [Biopython-dev] determining the version
In-Reply-To: <63700.34226.qm@web62405.mail.re1.yahoo.com>
References: <320fb6e00809241412r54c2a3a1mc69f3e573f1eaac7@mail.gmail.com>
<63700.34226.qm@web62405.mail.re1.yahoo.com>
Message-ID: <320fb6e00809250222h3d0d15bw763446b5f0ec44d1@mail.gmail.com>
On Thu, Sep 25, 2008 at 9:47 AM, Michiel de Hoon wrote:
>
>> Perhaps we should add a __version__ to Bio/__init__.py for
>> future releases, with the release "script" modified to
>> ensure this gets incremented to match that used in
>> setup.py (and Martel).
>
> Another solution is that setup.py uses (reads or imports)
> __init__.py to find out what the version is. For example,
> this is what matplotlib does in its setup.py script.
>
That sounds more sensible - I had been wondering about
how that could be automated but it was late last night.
>From a quick look at approach taken in the matplotlib
code, we could add something like this to setup.py
__version__ = "Undefined"
for line in open('Bio/__init__.py'):
if (line.startswith('__version__')):
exec(line.strip())
setup(
name='biopython',
version=__version__,
author='The Biopython Consortium',
...
I'm happy to deal with this if we are agreed that we
should add a __version__ to Bio/__init__.py
(variations on the naming are possible, but this seems
to be a de-facto standard in python libraries).
Peter
From bugzilla-daemon at portal.open-bio.org Thu Sep 25 07:51:13 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Thu, 25 Sep 2008 07:51:13 -0400
Subject: [Biopython-dev] [Bug 2251] [PATCH] NumPy support for BioPython
In-Reply-To:
Message-ID: <200809251151.m8PBpDEr028468@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2251
------- Comment #16 from biopython-bugzilla at maubp.freeserve.co.uk 2008-09-25 07:51 EST -------
We will also need to deal with the deprecation of the following
functions in numpy 1.2, which will be removed in numpy 1.3:
PyArray_FromDims to PyArray_SimpleNew.
PyArray_FromDimsAndDataAndDescr to PyArray_NewFromDescr
Quoting the numpy CAPI.txt file (thanks Bruce!),
------ start quote -------
``PyArray_SimpleNew(nd, dims, typenum)`` is a drop-in replacement
for ``PyArray_FromDims`` (except it takes ``npy_intp*`` dims
instead of ``int*`` dims which matters on 64-bit systems) and it
does not initialize the memory to zero.
``PyArray_SimpleNew`` is just a macro for ``PyArray_New`` with
default arguments. Use ``PyArray_FILLWBYTE(arr, 0)`` to fill
with zeros.
The ``PyArray_FromDims`` and family of functions are still
available and are loose wrappers around this function. These
functions still take ``int *`` arguments. This should be fine
on 32-bit systems, but on 64-bit systems you may run into
trouble if you frequently passed ``PyArray_FromDims`` the
dimensions member of the old ``PyArrayObject`` structure
because ``sizeof(npy_intp) != sizeof(int)``.
------ end quote -------
Here is a recent example of dealing with this - switching part
of scipy and how the pointer issue complicates things:
http://projects.scipy.org/pipermail/scipy-dev/2008-August/009581.html
http://scipy.org/scipy/scipy/ticket/723
See also
http://scipy.org/scipy/numpy/ticket/805
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bsouthey at gmail.com Thu Sep 25 09:28:03 2008
From: bsouthey at gmail.com (Bruce Southey)
Date: Thu, 25 Sep 2008 08:28:03 -0500
Subject: [Biopython-dev] determining the version
In-Reply-To: <320fb6e00809250222h3d0d15bw763446b5f0ec44d1@mail.gmail.com>
References: <320fb6e00809241412r54c2a3a1mc69f3e573f1eaac7@mail.gmail.com>
<63700.34226.qm@web62405.mail.re1.yahoo.com>
<320fb6e00809250222h3d0d15bw763446b5f0ec44d1@mail.gmail.com>
Message-ID: <48DB91E3.3060402@gmail.com>
Peter wrote:
> On Thu, Sep 25, 2008 at 9:47 AM, Michiel de Hoon wrote:
>
>>> Perhaps we should add a __version__ to Bio/__init__.py for
>>> future releases, with the release "script" modified to
>>> ensure this gets incremented to match that used in
>>> setup.py (and Martel).
>>>
>> Another solution is that setup.py uses (reads or imports)
>> __init__.py to find out what the version is. For example,
>> this is what matplotlib does in its setup.py script.
>>
>>
>
> That sounds more sensible - I had been wondering about
> how that could be automated but it was late last night.
> >From a quick look at approach taken in the matplotlib
> code, we could add something like this to setup.py
>
> __version__ = "Undefined"
> for line in open('Bio/__init__.py'):
> if (line.startswith('__version__')):
> exec(line.strip())
>
> setup(
> name='biopython',
> version=__version__,
> author='The Biopython Consortium',
> ...
>
> I'm happy to deal with this if we are agreed that we
> should add a __version__ to Bio/__init__.py
> (variations on the naming are possible, but this seems
> to be a de-facto standard in python libraries).
>
> Peter
>
>
Numpy uses the version.py file to obtain the version and this will also
include the svn version if an svn version of numpy is being used. The
advantage is that you can follow the developers changes to find when
something was fixed or broke. I think the same idea would work for
Biopython especially once it moves to svn.
For the 1.2.0rc2:
>>> numpy.__version__
'1.2.0rc2'
Bruce
From biopython at maubp.freeserve.co.uk Thu Sep 25 12:15:54 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 25 Sep 2008 17:15:54 +0100
Subject: [Biopython-dev] Sequences and simple plots
Message-ID: <320fb6e00809250915m42350c70xa51007c50c3c95fe@mail.gmail.com>
Hi all,
I've just added a couple of Bio.SeqIO with pylab examples to the cookbook
chapter of the Biopython tutorial.
The first shows a histogram of sequence lengths in a FASTA file (based
having recently done this for some real assembly data).
http://biopython.org/DIST/docs/tutorial/images/hist_plot.png
The second is based on the GC% example we used for the BOSC 2008
presentation (see http://biopython.org/wiki/Documentation#Presentations
for the original). http://biopython.org/DIST/docs/tutorial/images/gc_plot.png
If anyone has any suggestions for similar examples let me know (with code
would be great - but even a nice idea is worthwhile).
Peter
From biopython at maubp.freeserve.co.uk Thu Sep 25 12:58:37 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 25 Sep 2008 17:58:37 +0100
Subject: [Biopython-dev] Sequences and simple plots
In-Reply-To: <320fb6e00809250915m42350c70xa51007c50c3c95fe@mail.gmail.com>
References: <320fb6e00809250915m42350c70xa51007c50c3c95fe@mail.gmail.com>
Message-ID: <320fb6e00809250958y3605b932w391cab1e7c3507f9@mail.gmail.com>
> If anyone has any suggestions for similar examples let me know (with code
> would be great - but even a nice idea is worthwhile).
How about this example which draws a simple nucleotide dot plot for
the first two sequences in the input FASTA file?
#Step One, load the first two sequences as input
from Bio import SeqIO
handle = open("ls_orchid.fasta")
record_iterator = SeqIO.parse(handle, "fasta")
rec_one = record_iterator.next()
rec_two = record_iterator.next()
handle.close()
print "Comparing %s to %s" % (rec_one.id, rec_two.id)
#Step Two, compile a similarity matrix
# For simplicity, this is constructed as a list of lists
# of booleans (using a mismatch threshold would be more
# complicated). Also I'm recording mismatches rather than
# matches because that gives a nice image with the pylab
# gray colour scheme used later.
window = 7
seq_one = rec_one.seq.tostring()
seq_two = rec_two.seq.tostring()
data = [[(seq_one[i:i+window] <> seq_two[j:j+window]) \
for j in range(len(seq_one)-window)] \
for i in range(len(seq_two)-window)]
#Step Three, plot using pylab
import pylab
pylab.gray()
pylab.imshow(data)
pylab.xlabel("%s (length %i bp)" % (rec_one.id, len(rec_one)))
pylab.ylabel("%s (length %i bp)" % (rec_two.id, len(rec_two)))
pylab.title("Dot plot using window size %i\n(allowing no miss-matches)" \
% window)
#pylab.show()
pylab.savefig("dot_plot.png", dpi=75)
pylab.savefig("dot_plot.pdf")
Peter
From jflatow at northwestern.edu Thu Sep 25 14:34:00 2008
From: jflatow at northwestern.edu (Jared Flatow)
Date: Thu, 25 Sep 2008 13:34:00 -0500
Subject: [Biopython-dev] Sequences and simple plots
In-Reply-To: <320fb6e00809250915m42350c70xa51007c50c3c95fe@mail.gmail.com>
References: <320fb6e00809250915m42350c70xa51007c50c3c95fe@mail.gmail.com>
Message-ID: <5321F5EB-F2C1-4D1A-9A67-878C695C0945@northwestern.edu>
Hi Peter,
Good ideas for some useful examples! (though I can't actually find
them in the cookbook...)
Anyway I hope you don't mind but while I was looking, I added another
example for SeqIO input/output that uses the new format method:
http://biopython.org/wiki/SeqIO
I tend to prefer this type of method to SeqIO.write, though I don't
think it appears anywhere in the documentation
Regards,
jared
On Sep 25, 2008, at 11:15 AM, Peter wrote:
> Hi all,
>
> I've just added a couple of Bio.SeqIO with pylab examples to the
> cookbook
> chapter of the Biopython tutorial.
>
> The first shows a histogram of sequence lengths in a FASTA file (based
> having recently done this for some real assembly data).
> http://biopython.org/DIST/docs/tutorial/images/hist_plot.png
>
> The second is based on the GC% example we used for the BOSC 2008
> presentation (see http://biopython.org/wiki/
> Documentation#Presentations
> for the original). http://biopython.org/DIST/docs/tutorial/images/gc_plot.png
>
> If anyone has any suggestions for similar examples let me know (with
> code
> would be great - but even a nice idea is worthwhile).
>
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
From thomas at cbs.dtu.dk Thu Sep 25 14:57:52 2008
From: thomas at cbs.dtu.dk (Thomas Sicheritz-Ponten)
Date: Thu, 25 Sep 2008 20:57:52 +0200
Subject: [Biopython-dev] Cleaning up Bio.SeqUtils
In-Reply-To: <320fb6e00809170723w95c87b0r94c4b12574ce174f@mail.gmail.com>
References: <320fb6e00809170723w95c87b0r94c4b12574ce174f@mail.gmail.com>
Message-ID: <48DBDF30.3060106@cbs.dtu.dk>
Hej all,
as I am guilty for most of the functions in SeqUtils/__init__.py, I
might as well join the cleaning team ...
apply_on_multi_fasta and quicker_apply_on_multi_fasta were only
functions to the turn the original SeqUtils.py into a possible
standalone program, but I guess not many actually used it.
On the other hand quick_FASTA_reader was and still is used by a lot of
people, despite the irritating splitting bug which occurs if an entry
name happens to contain '>' ...
Also, the translate and complement functions are from the time were
these functions were not easily accessed (we are talking about 2001-2002)
In my opinion, apply_on_multi_fasta, quicker_apply_on_multi_fasta and
the redundant translation machinery could and should get removed. Also
if one can change the split function in quick_FASTA_reader? (I don't
have had checkin access since a long time)
Are there any other dubios functions we should discuss?
cheers
-thomas
--
Sicheritz-Ponten Thomas, Associate Professor, Ph.D (
Head of Metagenomics, Technical University of Denmark \
Center for Biological Sequence Analysis, BioCentrum )
CBS: +45 45 252422 Building 208, DK-2800 Lyngby ##----->
Fax: +45 45 931585 http://www.cbs.dtu.dk/~thomas )
/
... damn arrow eating trees ... (
Peter wrote:
> Dear all,
>
> I've previously mentioned the idea of cleaning up
> Bio/SeqUtils/__init__.py in passing. I've been reminded about this by
> Bug 2585 where Sebastian spotted a problem in one of the FASTA related
> functions.
> http://bugzilla.open-bio.org/show_bug.cgi?id=2585
>
> I've updated the docstrings in CVS to describe the three functions
> quick_FASTA_reader, apply_on_multi_fasta and
> quicker_apply_on_multi_fasta as obsolete but I would like to suggest
> going further and deprecating them.
>
> There are other dubious or redundant functions in
> Bio/SeqUtils/__init__.py such as a translate function. Again, would
> there be any objection to deprecating this too?
>
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
From biopython at maubp.freeserve.co.uk Thu Sep 25 15:39:49 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 25 Sep 2008 20:39:49 +0100
Subject: [Biopython-dev] Sequences and simple plots
In-Reply-To: <5321F5EB-F2C1-4D1A-9A67-878C695C0945@northwestern.edu>
References: <320fb6e00809250915m42350c70xa51007c50c3c95fe@mail.gmail.com>
<5321F5EB-F2C1-4D1A-9A67-878C695C0945@northwestern.edu>
Message-ID: <320fb6e00809251239i6308d6b9i6a334701ce1cd5f1@mail.gmail.com>
On Thu, Sep 25, 2008 at 7:34 PM, Jared Flatow wrote:
>
> Hi Peter,
>
> Good ideas for some useful examples! (though I can't actually find them in
> the cookbook...)
They are in CVS only at the moment - I can send you the PDF of the
current tutorial if you like off list. We don't normally update the
tutorial on the website except as part of making a new release - this
avoid the tutorial talking about unreleased code.
> Anyway I hope you don't mind but while I was looking, I added another
> example for SeqIO input/output that uses the new format method:
>
> http://biopython.org/wiki/SeqIO
>
> I tend to prefer this type of method to SeqIO.write, though I don't think it
> appears anywhere in the documentation
The format method should be in the Tutorial as of Biopython 1.48 (see
the final section of Chapters 4 and 5).
Personally I think for your new example using "with" just confuses
things, but otherwise mentioning the format() method in this context
makes sense. I would probably make it explicit that this with ONLY
work for sequential file formats - which is why I prefer to encourage
the SeqIO.write() method giving all the records at once (possibly as
an iterator).
Peter
From biopython at maubp.freeserve.co.uk Thu Sep 25 15:50:05 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Thu, 25 Sep 2008 20:50:05 +0100
Subject: [Biopython-dev] Cleaning up Bio.SeqUtils
In-Reply-To: <48DBDF30.3060106@cbs.dtu.dk>
References: <320fb6e00809170723w95c87b0r94c4b12574ce174f@mail.gmail.com>
<48DBDF30.3060106@cbs.dtu.dk>
Message-ID: <320fb6e00809251250l4f905490i328731c741c1dfc8@mail.gmail.com>
On Thu, Sep 25, 2008 at 7:57 PM, Thomas Sicheritz-Ponten
wrote:
> Hej all,
>
> as I am guilty for most of the functions in SeqUtils/__init__.py, I might as
> well join the cleaning team ...
Excellent :)
> apply_on_multi_fasta and quicker_apply_on_multi_fasta were only functions to
> the turn the original SeqUtils.py into a possible standalone program, but I
> guess not many actually used it.
That would explain some of that module's style. We could deprecate
the standalone bit too when we deprecate these functions.
> On the other hand quick_FASTA_reader was and still is used by a lot of
> people, despite the irritating splitting bug which occurs if an entry name
> happens to contain '>' ...
We should probably fix that if you think it can be done without
loosing the current simplicity and speed (see below).
> Also, the translate and complement functions are from the time were these
> functions were not easily accessed (we are talking about 2001-2002)
That does make sense - its a shame with hindsight that Biopython ended
up with several ways to do this.
> In my opinion, apply_on_multi_fasta, quicker_apply_on_multi_fasta and the
> redundant translation machinery could and should get removed.
OK. We should probably ask on the main list as a courtesy, and then
deprecate them for the next release.
> Also if one can change the split function in quick_FASTA_reader? (I don't
> have had checkin access since a long time)
If this is just an expired account / lost password you could try
emailing the OBF support guys directly. If they need someone to vouch
for you drop me or Michiel an email off list. In the short term I'm
happy to check in a patch on your behalf (by email or via a bug
report).
> Are there any other dubios functions we should discuss?
I'm sure there are more - but that should keep us busy for now :)
Are you happy with my recent tweak to the seq3 function (CVS revision
1.15)? I wasn't 100% sure why it had used "Xer"
http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/SeqUtils/__init__.py.diff?r1=1.14&r2=1.15&cvsroot=biopython
Thanks,
Peter
From jblanca at btc.upv.es Thu Sep 25 10:49:34 2008
From: jblanca at btc.upv.es (Jose Blanca)
Date: Thu, 25 Sep 2008 16:49:34 +0200
Subject: [Biopython-dev] about the SeqRecord and SeqFeature classes
In-Reply-To: <320fb6e00809230737h223e6e3dgac6bf0fbbf4af41@mail.gmail.com>
References: <200809231440.24684.jblanca@btc.upv.es>
<320fb6e00809230737h223e6e3dgac6bf0fbbf4af41@mail.gmail.com>
Message-ID: <200809251649.34934.jblanca@btc.upv.es>
Hi:
On Tuesday 23 September 2008 16:37:29 Peter wrote:
> > SeqRecord still doesn't have a __getitem__ method.
>
> What do you think of the __getitem__ method proposed in attachment 942
> on Bug 2507?
> http://bugzilla.open-bio.org/show_bug.cgi?id=2507
I've been looking at the path and is just what I need.
Using a SeqRecord with that __getitem__ method is almost trivial. Attach to
this email inside mySeqRecord.py is a possible implementation. What do you
think?
For the qualities a tuple of ints would do.
For implementing some details new style classes would be better. Are you
planning to move Seq and SeqRecord to the new style?
Best regards,
--
Jose M. Blanca Postigo
Instituto Universitario de Conservacion y
Mejora de la Agrodiversidad Valenciana (COMAV)
Universidad Politecnica de Valencia (UPV)
Edificio CPI (Ciudad Politecnica de la Innovacion), 8E
46022 Valencia (SPAIN)
Tlf.:+34-96-3877000 (ext 88473)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mySeqRecord.py
Type: application/x-python
Size: 10949 bytes
Desc: not available
URL:
-------------- next part --------------
A non-text attachment was scrubbed...
Name: seqrecordtest.py
Type: application/x-python
Size: 4587 bytes
Desc: not available
URL:
From thomas at cbs.dtu.dk Thu Sep 25 18:47:58 2008
From: thomas at cbs.dtu.dk (Thomas Sicheritz-Ponten)
Date: Fri, 26 Sep 2008 00:47:58 +0200
Subject: [Biopython-dev] Cleaning up Bio.SeqUtils
In-Reply-To: <320fb6e00809251250l4f905490i328731c741c1dfc8@mail.gmail.com>
References: <320fb6e00809170723w95c87b0r94c4b12574ce174f@mail.gmail.com> <48DBDF30.3060106@cbs.dtu.dk>
<320fb6e00809251250l4f905490i328731c741c1dfc8@mail.gmail.com>
Message-ID: <48DC151E.3090802@cbs.dtu.dk>
Peter, can you check in the corrected version of quick_FASTA_reader for
me? I added the changes which were suggested in earlier posts (changes
not affecting speed and simplicity)
def quick_FASTA_reader(file):
"simple and quick FASTA reader to be used on large FASTA files"
from os import linesep
txt = open(file).read()
entries = []
splitter = "%s>" % linesep
for entry in txt.split(splitter):
name,seq= entry.split(linesep,1)
if name[0]=='>': name = name[1:]
seq = seq.replace('\n','').replace(' ','').upper()
entries.append((name, seq))
return entries
Concerning the seq3 function, I am not sure where it came from, I don't
think I have added it.
cheers
-thomas
Peter wrote:
> On Thu, Sep 25, 2008 at 7:57 PM, Thomas Sicheritz-Ponten
> wrote:
>> Hej all,
>>
>> as I am guilty for most of the functions in SeqUtils/__init__.py, I might as
>> well join the cleaning team ...
>
> Excellent :)
>
>> apply_on_multi_fasta and quicker_apply_on_multi_fasta were only functions to
>> the turn the original SeqUtils.py into a possible standalone program, but I
>> guess not many actually used it.
>
> That would explain some of that module's style. We could deprecate
> the standalone bit too when we deprecate these functions.
>
>> On the other hand quick_FASTA_reader was and still is used by a lot of
>> people, despite the irritating splitting bug which occurs if an entry name
>> happens to contain '>' ...
>
> We should probably fix that if you think it can be done without
> loosing the current simplicity and speed (see below).
>
>> Also, the translate and complement functions are from the time were these
>> functions were not easily accessed (we are talking about 2001-2002)
>
> That does make sense - its a shame with hindsight that Biopython ended
> up with several ways to do this.
>
>> In my opinion, apply_on_multi_fasta, quicker_apply_on_multi_fasta and the
>> redundant translation machinery could and should get removed.
>
> OK. We should probably ask on the main list as a courtesy, and then
> deprecate them for the next release.
>
>> Also if one can change the split function in quick_FASTA_reader? (I don't
>> have had checkin access since a long time)
>
> If this is just an expired account / lost password you could try
> emailing the OBF support guys directly. If they need someone to vouch
> for you drop me or Michiel an email off list. In the short term I'm
> happy to check in a patch on your behalf (by email or via a bug
> report).
>
>> Are there any other dubios functions we should discuss?
>
> I'm sure there are more - but that should keep us busy for now :)
>
> Are you happy with my recent tweak to the seq3 function (CVS revision
> 1.15)? I wasn't 100% sure why it had used "Xer"
>
> http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/SeqUtils/__init__.py.diff?r1=1.14&r2=1.15&cvsroot=biopython
>
> Thanks,
>
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
--
Sicheritz-Ponten Thomas, Associate Professor, Ph.D (
Head of Metagenomics, Technical University of Denmark \
Center for Biological Sequence Analysis, BioCentrum )
CBS: +45 45 252422 Building 208, DK-2800 Lyngby ##----->
Fax: +45 45 931585 http://www.cbs.dtu.dk/~thomas )
/
... damn arrow eating trees ... (
From biopython at maubp.freeserve.co.uk Fri Sep 26 05:38:57 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 26 Sep 2008 10:38:57 +0100
Subject: [Biopython-dev] Cleaning up Bio.SeqUtils
In-Reply-To: <48DC151E.3090802@cbs.dtu.dk>
References: <320fb6e00809170723w95c87b0r94c4b12574ce174f@mail.gmail.com>
<48DBDF30.3060106@cbs.dtu.dk>
<320fb6e00809251250l4f905490i328731c741c1dfc8@mail.gmail.com>
<48DC151E.3090802@cbs.dtu.dk>
Message-ID: <320fb6e00809260238n41e027c7g877bc040b49cb1e4@mail.gmail.com>
On Thu, Sep 25, 2008 at 11:47 PM, Thomas Sicheritz-Ponten
wrote:
> Peter, can you check in the corrected version of quick_FASTA_reader for me?
> I added the changes which were suggested in earlier posts (changes not
> affecting speed and simplicity)
>
> def quick_FASTA_reader(file):
> "simple and quick FASTA reader to be used on large FASTA files"
> from os import linesep
> txt = open(file).read()
> entries = []
> splitter = "%s>" % linesep
> for entry in txt.split(splitter):
> name,seq= entry.split(linesep,1)
> if name[0]=='>': name = name[1:]
> seq = seq.replace('\n','').replace(' ','').upper()
> entries.append((name, seq))
> return entries
I'm pretty sure we shouldn't be using os.linesep in this way. I'd
have to double check on a Windows box to confirm this, but I believe
from memory that any CRLF in the file becomes just a \n in python.
The basic idea is we want to split on "\n>" so that any additional ">"
inside a name are ignored. This than means the first record in the
file is a special case. You've also added an extra if statement in
the loop - I assume to cope with the fact that using a split on "\n>"
would leave a leading ">" on the first record's name -- but this would
go wrong if the name itself started with a ">" too (i.e. a line
starting with ">>..." which would be unusual).
Perhaps instead, as a typical FASTA file starts immediately with ">"
we can just do the split on "\n"+contents of file. I've updated CVS
based on this, and added a minimal test for quick_FASTA_reader (and
GC) to test_SeqUtils.py as well.
Checking in Bio/SeqUtils/__init__.py;
/home/repository/biopython/biopython/Bio/SeqUtils/__init__.py,v <--
__init__.py
new revision: 1.17; previous revision: 1.16
done
Checking in Tests/test_SeqUtils.py;
/home/repository/biopython/biopython/Tests/test_SeqUtils.py,v <--
test_SeqUtils.py
new revision: 1.2; previous revision: 1.1
done
Checking in Tests/output/test_SeqUtils;
/home/repository/biopython/biopython/Tests/output/test_SeqUtils,v <--
test_SeqUtils
new revision: 1.2; previous revision: 1.1
done
Could you have a look at Bio/SeqUtils/__init__.py revision 1.17 for
review? It will be up on ViewCVS shortly...
http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/SeqUtils/__init__.py?cvsroot=biopython
Do you think I should remove the "OBSOLETE" tag in the docstring for
the quick_FASTA_reader function?
> Concerning the seq3 function, I am not sure where it came from, I don't
> think I have added it.
OK, thanks.
Peter
From biopython at maubp.freeserve.co.uk Fri Sep 26 05:50:39 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 26 Sep 2008 10:50:39 +0100
Subject: [Biopython-dev] about the SeqRecord and SeqFeature classes
In-Reply-To: <200809251649.34934.jblanca@btc.upv.es>
References: <200809231440.24684.jblanca@btc.upv.es>
<320fb6e00809230737h223e6e3dgac6bf0fbbf4af41@mail.gmail.com>
<200809251649.34934.jblanca@btc.upv.es>
Message-ID: <320fb6e00809260250r66422454g2a5ec665330dd934@mail.gmail.com>
On Thu, Sep 25, 2008 at 3:49 PM, Jose Blanca wrote:
> Hi:
>
> On Tuesday 23 September 2008 16:37:29 Peter wrote:
>> > SeqRecord still doesn't have a __getitem__ method.
>>
>> What do you think of the __getitem__ method proposed in attachment 942
>> on Bug 2507?
>> http://bugzilla.open-bio.org/show_bug.cgi?id=2507
> I've been looking at the patch and is just what I need.
> Using a SeqRecord with that __getitem__ method is almost trivial.
Good :)
I'd like to check this into CVS but it would be best to have a third
person comment on the code first.
Once (if) this is included, I would then plan to use this for slicing
alignment objects (Bug 2551)
http://bugzilla.open-bio.org/show_bug.cgi?id=2551
> Attach to this email inside mySeqRecord.py is a possible implementation.
> What do you think? For the qualities a tuple of ints would do.
I see you have created a subclass the SeqRecord to add a quality
property, and made sure this gets sliced too in the __getitem__. This
is a nice approach (and demonstrates how people could extend the basic
Biopython objects in their own code). I would also suggest in the
__init__ method checking that the quality sequence is the same length
as the sequence itself. Your code looks like it would cope with any
python sequence object (string, list, tuple) for the quality, and you
could use integers or floats here. Very flexible.
If we were to add something like this to Biopython directly, I prefer
"quality" over "qual" (just three letters longer but much clearer). I
would also consider adding the quality to the Seq object (subclassing
the Seq object rather than the SeqRecord object). My reasoning is
that for 454 or Solexa sequencing, you will have thousands of reads
and all you really care about is the nucleotide sequence and the
quality scores. Unless you want to give them all unique names, there
little point having the overhead of the various annotation properties
of the SeqRecord.
> For implementing some details new style classes would be better. Are you
> planning to move Seq and SeqRecord to the new style?
If we have a good reason to - adding docstrings to the properties would be nice.
Peter
From thomas at cbs.dtu.dk Fri Sep 26 05:54:12 2008
From: thomas at cbs.dtu.dk (Thomas Sicheritz-Ponten)
Date: Fri, 26 Sep 2008 11:54:12 +0200
Subject: [Biopython-dev] Cleaning up Bio.SeqUtils
In-Reply-To: <320fb6e00809260238n41e027c7g877bc040b49cb1e4@mail.gmail.com>
References: <320fb6e00809170723w95c87b0r94c4b12574ce174f@mail.gmail.com> <48DBDF30.3060106@cbs.dtu.dk> <320fb6e00809251250l4f905490i328731c741c1dfc8@mail.gmail.com> <48DC151E.3090802@cbs.dtu.dk>
<320fb6e00809260238n41e027c7g877bc040b49cb1e4@mail.gmail.com>
Message-ID: <48DCB144.9020801@cbs.dtu.dk>
Ok, fair enough :-)
Please remove also the OBSOLETE tag - as Bio.SeqIO.parse is not really a
substitution for quick_FASTA_reader
cheers
-thomas
Peter wrote:
> On Thu, Sep 25, 2008 at 11:47 PM, Thomas Sicheritz-Ponten
> wrote:
>> Peter, can you check in the corrected version of quick_FASTA_reader for me?
>> I added the changes which were suggested in earlier posts (changes not
>> affecting speed and simplicity)
>>
>> def quick_FASTA_reader(file):
>> "simple and quick FASTA reader to be used on large FASTA files"
>> from os import linesep
>> txt = open(file).read()
>> entries = []
>> splitter = "%s>" % linesep
>> for entry in txt.split(splitter):
>> name,seq= entry.split(linesep,1)
>> if name[0]=='>': name = name[1:]
>> seq = seq.replace('\n','').replace(' ','').upper()
>> entries.append((name, seq))
>> return entries
>
> I'm pretty sure we shouldn't be using os.linesep in this way. I'd
> have to double check on a Windows box to confirm this, but I believe
> from memory that any CRLF in the file becomes just a \n in python.
>
> The basic idea is we want to split on "\n>" so that any additional ">"
> inside a name are ignored. This than means the first record in the
> file is a special case. You've also added an extra if statement in
> the loop - I assume to cope with the fact that using a split on "\n>"
> would leave a leading ">" on the first record's name -- but this would
> go wrong if the name itself started with a ">" too (i.e. a line
> starting with ">>..." which would be unusual).
>
> Perhaps instead, as a typical FASTA file starts immediately with ">"
> we can just do the split on "\n"+contents of file. I've updated CVS
> based on this, and added a minimal test for quick_FASTA_reader (and
> GC) to test_SeqUtils.py as well.
>
> Checking in Bio/SeqUtils/__init__.py;
> /home/repository/biopython/biopython/Bio/SeqUtils/__init__.py,v <--
> __init__.py
> new revision: 1.17; previous revision: 1.16
> done
> Checking in Tests/test_SeqUtils.py;
> /home/repository/biopython/biopython/Tests/test_SeqUtils.py,v <--
> test_SeqUtils.py
> new revision: 1.2; previous revision: 1.1
> done
> Checking in Tests/output/test_SeqUtils;
> /home/repository/biopython/biopython/Tests/output/test_SeqUtils,v <--
> test_SeqUtils
> new revision: 1.2; previous revision: 1.1
> done
>
> Could you have a look at Bio/SeqUtils/__init__.py revision 1.17 for
> review? It will be up on ViewCVS shortly...
> http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/SeqUtils/__init__.py?cvsroot=biopython
>
> Do you think I should remove the "OBSOLETE" tag in the docstring for
> the quick_FASTA_reader function?
>
>> Concerning the seq3 function, I am not sure where it came from, I don't
>> think I have added it.
>
> OK, thanks.
>
> Peter
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
--
Sicheritz-Ponten Thomas, Associate Professor, Ph.D (
Head of Metagenomics, Technical University of Denmark \
Center for Biological Sequence Analysis, BioCentrum )
CBS: +45 45 252422 Building 208, DK-2800 Lyngby ##----->
Fax: +45 45 931585 http://www.cbs.dtu.dk/~thomas )
/
... damn arrow eating trees ... (
From biopython at maubp.freeserve.co.uk Fri Sep 26 06:08:04 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 26 Sep 2008 11:08:04 +0100
Subject: [Biopython-dev] Cleaning up Bio.SeqUtils
In-Reply-To: <48DCB144.9020801@cbs.dtu.dk>
References: <320fb6e00809170723w95c87b0r94c4b12574ce174f@mail.gmail.com>
<48DBDF30.3060106@cbs.dtu.dk>
<320fb6e00809251250l4f905490i328731c741c1dfc8@mail.gmail.com>
<48DC151E.3090802@cbs.dtu.dk>
<320fb6e00809260238n41e027c7g877bc040b49cb1e4@mail.gmail.com>
<48DCB144.9020801@cbs.dtu.dk>
Message-ID: <320fb6e00809260308q58659aech1b5671ce76f1eeef@mail.gmail.com>
On Fri, Sep 26, 2008 at 10:54 AM, Thomas Sicheritz-Ponten
wrote:
> Ok, fair enough :-)
> Please remove also the OBSOLETE tag - as Bio.SeqIO.parse is not really a
> substitution for quick_FASTA_reader
OK, I've done that and reworded the docstring. I agree that Bio.SeqIO is
not a direct substitute for quick_FASTA_reader but they both have their
plus points.
I'll send out an email to the main list about deprecating the following:
Using Bio/SeqUtils as a script
Bio.SeqUtils.apply_on_multi_fasta
Bio.SeqUtils.quicker_apply_on_multi_fasta
Bio.SeqUtils.translate
What about fasta_uniqids? It reads a file but prints to screen which
doesn't seem useful in a python script.
Peter
From biopython at maubp.freeserve.co.uk Fri Sep 26 06:15:52 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 26 Sep 2008 11:15:52 +0100
Subject: [Biopython-dev] Sequences and simple plots
In-Reply-To: <320fb6e00809251239i6308d6b9i6a334701ce1cd5f1@mail.gmail.com>
References: <320fb6e00809250915m42350c70xa51007c50c3c95fe@mail.gmail.com>
<5321F5EB-F2C1-4D1A-9A67-878C695C0945@northwestern.edu>
<320fb6e00809251239i6308d6b9i6a334701ce1cd5f1@mail.gmail.com>
Message-ID: <320fb6e00809260315x62634eadw6b0dd17e074bdeb2@mail.gmail.com>
On Thu, Sep 25, 2008 at 8:39 PM, Peter wrote:
> On Thu, Sep 25, 2008 at 7:34 PM, Jared Flatow wrote:
>>
>> Hi Peter,
>>
>> Good ideas for some useful examples! (though I can't actually find them in
>> the cookbook...)
>
> They are in CVS only at the moment - I can send you the PDF of the
> current tutorial if you like off list. We don't normally update the
> tutorial on the website except as part of making a new release - this
> avoid the tutorial talking about unreleased code.
Cut and paste for people to comment on directly,
The first shows a histogram of sequence lengths in a FASTA file (based
having recently done this for some real assembly data). Sample output:
http://biopython.org/DIST/docs/tutorial/images/hist_plot.png
from Bio import SeqIO
handle = open("ls_orchid.fasta")
sizes = [len(seq_record) for seq_record in SeqIO.parse(handle, "fasta")]
handle.close()
import pylab
pylab.hist(sizes, bins=20)
pylab.title("%i orchid sequences\nLengths %i to %i" \
% (len(sizes),min(sizes),max(sizes)))
pylab.xlabel("Sequence length (bp)")
pylab.ylabel("Count")
pylab.show()
The second is based on the GC% example we used for the BOSC 2008
presentation: http://biopython.org/DIST/docs/tutorial/images/gc_plot.png
from Bio import SeqIO
from Bio.SeqUtils import GC
handle = open("ls_orchid.fasta")
gc_values = [GC(seq_record.seq) for seq_record in SeqIO.parse(handle, "fasta")]
gc_values.sort()
handle.close()
import pylab
pylab.plot(gc_values) pylab.title("%i orchid sequences\nGC%% %0.1f to %0.1f" \
% (len(gc_values),min(gc_values),max(gc_values)))
pylab.xlabel("Genes")
pylab.ylabel("GC%")
pylab.show()
Peter
From biopython at maubp.freeserve.co.uk Fri Sep 26 07:02:00 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 26 Sep 2008 12:02:00 +0100
Subject: [Biopython-dev] Cleaning up Bio.SeqUtils
In-Reply-To: <48DC151E.3090802@cbs.dtu.dk>
References: <320fb6e00809170723w95c87b0r94c4b12574ce174f@mail.gmail.com>
<48DBDF30.3060106@cbs.dtu.dk>
<320fb6e00809251250l4f905490i328731c741c1dfc8@mail.gmail.com>
<48DC151E.3090802@cbs.dtu.dk>
Message-ID: <320fb6e00809260402k612b0465uf0326d5c5cb48dff@mail.gmail.com>
>> Are you happy with my recent tweak to the seq3 function (CVS revision
>> 1.15)? I wasn't 100% sure why it had used "Xer"
It just occurred to me this could be short for "X error"?
> Concerning the seq3 function, I am not sure where it came from, I don't
> think I have added it.
>
Looking over the CVS logs, I think it might have been you (CVS user
"thomas") - but it was six years ago.
See Bio/SeqUtils/__init__.py revision 1.2
http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/SeqUtils/__init__.py?cvsroot=biopython
The comments say Bio.SeqUtils.seq3 was inspired by BioPerl. I've only
skimmed the BioPerl SVN history, but they do seem to use "Xaa" and not
"Xer",
http://code.open-bio.org/svnweb/index.cgi/bioperl/view/bioperl-live/trunk/Bio/SeqUtils.pm
Peter
From bugzilla-daemon at portal.open-bio.org Fri Sep 26 08:44:16 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 26 Sep 2008 08:44:16 -0400
Subject: [Biopython-dev] [Bug 2425] Fasta ID parsing error
In-Reply-To:
Message-ID: <200809261244.m8QCiGji013606@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2425
------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-09-26 08:44 EST -------
(In reply to comment #1)
> I assume in your example you expected "region1.fasta.screen.Contig1" to be
> used as the record key in BioSQL? There is a 40 character limit on this
> field, which should be fine for most FASTA identifiers.
In BioSQL v1.0.1, fields bioentry.accession and dbxref.accession were increased
from 40 to 128 characters. See
http://lists.open-bio.org/pipermail/biosql-l/2008-August/001311.html
However, bioentry.name is still only 40 characters.
It looks like for a FASTA file like this:
>gi|9629357|ref|NC_001802.1| Human immunodeficiency virus type 1, complete genome
GGTCTCTCTGGTTAGACCAGATCTGAGCCTGGGAGCTCTCTGGCTAACTAGGGAACCCACTGCTTAAGCC
TCAATAAAGCTTGCCTTGAGTGCTTCAAGTAGTGTGTGCCCGTCTGTTGTGTGACTCTGGTAACTAGAGA
...
BioPerl will use "gi|9629357|ref|NC_001802.1|" as bioentry.name and
bioentry.identifier with "Human immunodeficiency virus type 1, complete genome"
as bioentry.description, 0 as the version (BioSQL convention when unknown),
with bioentry.taxon_id and bioentry.division as NULL.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From jblanca at btc.upv.es Fri Sep 26 09:18:02 2008
From: jblanca at btc.upv.es (Jose Blanca)
Date: Fri, 26 Sep 2008 15:18:02 +0200
Subject: [Biopython-dev] about the SeqRecord and SeqFeature classes
In-Reply-To: <320fb6e00809260250r66422454g2a5ec665330dd934@mail.gmail.com>
References: <200809231440.24684.jblanca@btc.upv.es>
<200809251649.34934.jblanca@btc.upv.es>
<320fb6e00809260250r66422454g2a5ec665330dd934@mail.gmail.com>
Message-ID: <200809261518.02453.jblanca@btc.upv.es>
Hi:
> I see you have created a subclass the SeqRecord to add a quality
> property, and made sure this gets sliced too in the __getitem__. This
> is a nice approach (and demonstrates how people could extend the basic
> Biopython objects in their own code). I would also suggest in the
> __init__ method checking that the quality sequence is the same length
> as the sequence itself.
To do that in a proper way I would like to use property, that's why I was
asking for the possibility of transforming SeqRecord and Seq in new style
classes.
> If we were to add something like this to Biopython directly, I prefer
> "quality" over "qual" (just three letters longer but much clearer).
That's not a problem. I used qual to do it similar to .seq
> I would also consider adding the quality to the Seq object (subclassing
> the Seq object rather than the SeqRecord object). My reasoning is
> that for 454 or Solexa sequencing, you will have thousands of reads
> and all you really care about is the nucleotide sequence and the
> quality scores. Unless you want to give them all unique names, there
> little point having the overhead of the various annotation properties
> of the SeqRecord.
I didn't subclass Seq because if we want a quality without name we could just
use a tuple or a list. My idea was to create a class with two main
properties, seq and qual (or quality). Seq does not has a seq property, it is
a sequence. Since SeqRecord already has a seq property I subclassed it adding
the qual property. Another alternative would be to create a new
SeqWithQuality class without subclassing SeqRecord.
I looked at the BioPerl model. They have several classes dealing with
sequences and qualities:
Seq: - has a seq property (unlike BioPython's Seq that is a sequence and has
no seq property). Besides has and id or a name.
Qual: - has a qual property, and an id or a name.
SeqWithQual: - has a seq and Qual properties.
I didn't create a Qual class with a qual property and a name because there is
no Seq class with a seq an a name. I thought that a tuple or a list of ints
would be equivalent to BioPython's Seq and would take the part of the
BioPerl's Qual.
What do you think about this model?
I agree that this classes should be prepared to deal with a lot of sequences
and they should be efficient. But I don't have the experience to foresee
which model would be better in that regard.
--
Jose M. Blanca Postigo
Instituto Universitario de Conservacion y
Mejora de la Agrodiversidad Valenciana (COMAV)
Universidad Politecnica de Valencia (UPV)
Edificio CPI (Ciudad Politecnica de la Innovacion), 8E
46022 Valencia (SPAIN)
Tlf.:+34-96-3877000 (ext 88473)
From bugzilla-daemon at portal.open-bio.org Fri Sep 26 09:30:13 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 26 Sep 2008 09:30:13 -0400
Subject: [Biopython-dev] [Bug 2425] Fasta ID parsing error
In-Reply-To:
Message-ID: <200809261330.m8QDUDwJ016360@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2425
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution| |FIXED
------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-09-26 09:30 EST -------
OK, I think this is fixed in CVS now. I have also updated the
test_BioSQL_SeqIO.py unit test to check importing and retrieving a range of
different FASTA files.
Of course, having a second person double check this works would be great. Feel
free to comment here (or reopen the bug) as appropriate.
Peter
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From jflatow at northwestern.edu Fri Sep 26 09:40:46 2008
From: jflatow at northwestern.edu (Jared Flatow)
Date: Fri, 26 Sep 2008 08:40:46 -0500
Subject: [Biopython-dev] Sequences and simple plots
In-Reply-To: <320fb6e00809260315x62634eadw6b0dd17e074bdeb2@mail.gmail.com>
References: <320fb6e00809250915m42350c70xa51007c50c3c95fe@mail.gmail.com>
<5321F5EB-F2C1-4D1A-9A67-878C695C0945@northwestern.edu>
<320fb6e00809251239i6308d6b9i6a334701ce1cd5f1@mail.gmail.com>
<320fb6e00809260315x62634eadw6b0dd17e074bdeb2@mail.gmail.com>
Message-ID:
On Sep 26, 2008, at 5:15 AM, Peter wrote:
> Cut and paste for people to comment on directly,
Ok, cool.
> The first shows a histogram of sequence lengths in a FASTA file (based
> having recently done this for some real assembly data). Sample
> output:
> http://biopython.org/DIST/docs/tutorial/images/hist_plot.png
>
> from Bio import SeqIO
> handle = open("ls_orchid.fasta")
> sizes = [len(seq_record) for seq_record in SeqIO.parse(handle,
> "fasta")]
> handle.close()
>
> import pylab
> pylab.hist(sizes, bins=20)
> pylab.title("%i orchid sequences\nLengths %i to %i" \
> % (len(sizes),min(sizes),max(sizes)))
> pylab.xlabel("Sequence length (bp)")
> pylab.ylabel("Count")
> pylab.show()
Its a perfectly fine example, my only comment would be to do something
like this:
seqs = list(SeqIO.parse(handle, 'fasta'))
hist([len(seq) for seq in seqs], bins=20)
I like to keep the whole sequences in memory, especially if I am just
digging around the data. Also I use the alpha parameter a lot for
histograms, especially when doing overlapping ones. So then you can
also do something like this:
hist([len(seq) for seq in seqs if GC(seq.seq) < .5], bins=20, alpha=.
5, fc='r')
hist([len(seq) for seq in seqs if GC(seq.seq) >= .5], bins=20, alpha=.
5, fc='b')
> The second is based on the GC% example we used for the BOSC 2008
> presentation: http://biopython.org/DIST/docs/tutorial/images/gc_plot.png
>
> from Bio import SeqIO
> from Bio.SeqUtils import GC
> handle = open("ls_orchid.fasta")
> gc_values = [GC(seq_record.seq) for seq_record in
> SeqIO.parse(handle, "fasta")]
> gc_values.sort()
> handle.close()
>
> import pylab
> pylab.plot(gc_values) pylab.title("%i orchid sequences\nGC%% %0.1f
> to %0.1f" \
> % (len(gc_values),min(gc_values),max(gc_values)))
> pylab.xlabel("Genes")
> pylab.ylabel("GC%")
> pylab.show()
Again, if you had all the sequences in a list:
plot(sorted(GC(seq.seq) for seq in seqs))
jared
From biopython at maubp.freeserve.co.uk Fri Sep 26 09:43:25 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 26 Sep 2008 14:43:25 +0100
Subject: [Biopython-dev] about the SeqRecord and SeqFeature classes
In-Reply-To: <200809261518.02453.jblanca@btc.upv.es>
References: <200809231440.24684.jblanca@btc.upv.es>
<200809251649.34934.jblanca@btc.upv.es>
<320fb6e00809260250r66422454g2a5ec665330dd934@mail.gmail.com>
<200809261518.02453.jblanca@btc.upv.es>
Message-ID: <320fb6e00809260643w1534f237g4b5314b9a884191e@mail.gmail.com>
Hi Jose,
>> I see you have created a subclass [of] the SeqRecord to add a quality
>> property, and made sure this gets sliced too in the __getitem__. This
>> is a nice approach (and demonstrates how people could extend the basic
>> Biopython objects in their own code). I would also suggest in the
>> __init__ method checking that the quality sequence is the same length
>> as the sequence itself.
>
> To do that in a proper way I would like to use property, that's why I was
> asking for the possibility of transforming SeqRecord and Seq in new style
> classes.
Oh I see - then you could put the length check in the property set method?
Would you like to file an enhancement bug for transforming SeqRecord
and Seq into new style classes, and prepare a patch (for this only)?
If this doesn't cause any problems with the unit tests then I don't
foresee any problems getting that change made.
>> If we were to add something like this to Biopython directly, I prefer
>> "quality" over "qual" (just three letters longer but much clearer).
>
> That's not a problem. I used qual to do it similar to .seq
Style is often debatable. Sequence is quite long, and seq is fairly
clear. Qual on the other hand could be short for qualifier (a term
used in feature annotation).
>> I would also consider adding the quality to the Seq object (subclassing
>> the Seq object rather than the SeqRecord object). My reasoning is
>> that for 454 or Solexa sequencing, you will have thousands of reads
>> and all you really care about is the nucleotide sequence and the
>> quality scores. Unless you want to give them all unique names, there
>> little point having the overhead of the various annotation properties
>> of the SeqRecord.
>
> I didn't subclass Seq because if we want a quality without name we could just
> use a tuple or a list. My idea was to create a class with two main
> properties, seq and qual (or quality). ...
> I agree that this classes should be prepared to deal with a lot of sequences
> and they should be efficient. But I don't have the experience to foresee
> which model would be better in that regard.
I haven't had to deal with 454 or solexa sequence data yet (but I am
hoping to in the next six months). Given there are lots of possible
implementation/object structure ideas, I think it might be premature
to pick one for Biopython right now. Would you be happy with the
SeqRecord __getitem__ method (Bug 2507) and creating the subclassed
SeqRecord with quality in your own code? If you find that works well
in real usage, it would be encouraging for us to use it Biopython. Or
have you already been using something like this for serious data
analysis?
Peter
From jblanca at btc.upv.es Fri Sep 26 10:16:00 2008
From: jblanca at btc.upv.es (Jose Blanca)
Date: Fri, 26 Sep 2008 16:16:00 +0200
Subject: [Biopython-dev] about the SeqRecord and SeqFeature classes
In-Reply-To: <320fb6e00809260643w1534f237g4b5314b9a884191e@mail.gmail.com>
References: <200809231440.24684.jblanca@btc.upv.es>
<200809261518.02453.jblanca@btc.upv.es>
<320fb6e00809260643w1534f237g4b5314b9a884191e@mail.gmail.com>
Message-ID: <200809261616.00911.jblanca@btc.upv.es>
Hi:
> > To do that in a proper way I would like to use property, that's why I was
> > asking for the possibility of transforming SeqRecord and Seq in new style
> > classes.
>
> Oh I see - then you could put the length check in the property set method?
That's exactly right.
> Would you like to file an enhancement bug for transforming SeqRecord
> and Seq into new style classes, and prepare a patch (for this only)?
> If this doesn't cause any problems with the unit tests then I don't
> foresee any problems getting that change made.
I will, although first I have to look how to do it. I think that I have to
take a look at your developer docs.
> Style is often debatable. Sequence is quite long, and seq is fairly
> clear. Qual on the other hand could be short for qualifier (a term
> used in feature annotation).
I see, you've got a point there.
> I haven't had to deal with 454 or solexa sequence data yet (but I am
> hoping to in the next six months).
I'm exactly working on that right now.
> Given there are lots of possible
> implementation/object structure ideas, I think it might be premature
> to pick one for Biopython right now. Would you be happy with the
> SeqRecord __getitem__ method (Bug 2507) and creating the subclassed
> SeqRecord with quality in your own code? If you find that works well
> in real usage, it would be encouraging for us to use it Biopython.
That's a great way to do it.
> Or
> have you already been using something like this for serious data
> analysis?
Not yet.
Best regards,
--
Jose M. Blanca Postigo
Instituto Universitario de Conservacion y
Mejora de la Agrodiversidad Valenciana (COMAV)
Universidad Politecnica de Valencia (UPV)
Edificio CPI (Ciudad Politecnica de la Innovacion), 8E
46022 Valencia (SPAIN)
Tlf.:+34-96-3877000 (ext 88473)
From bugzilla-daemon at portal.open-bio.org Fri Sep 26 11:11:36 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 26 Sep 2008 11:11:36 -0400
Subject: [Biopython-dev] [Bug 2507] Adding __getitem__ to SeqRecord for
element access and slicing
In-Reply-To:
Message-ID: <200809261511.m8QFBaOG024019@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2507
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #942 is|0 |1
obsolete| |
------- Comment #11 from biopython-bugzilla at maubp.freeserve.co.uk 2008-09-26 11:11 EST -------
Created an attachment (id=998)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=998&action=view)
Updated patch to SeqRecord.py and SeqFeature.py
This updates the patch to work on the current code in CVS (the new format
method has been committed since).
This also makes a small but subtle change to checking the end point of each
feature to determine if is should be included when generating a sub-record.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Sep 26 11:52:50 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 26 Sep 2008 11:52:50 -0400
Subject: [Biopython-dev] [Bug 2596] New: Add string like strip,
rstrip and lstrip methods to the Seq object
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2596
Summary: Add string like strip, rstrip and lstrip methods to the
Seq object
Product: Biopython
Version: Not Applicable
Platform: All
OS/Version: All
Status: NEW
Severity: enhancement
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk
OtherBugsDependingO 2351
nThis:
As part of Bug 2351 to make the Seq object more string like, it would be nice
to add strip, rstrip and lstrip methods to the Seq object.
The returned Seq object will have the same alphabet as the parent sequence.
While for strings defaulting to removing white space (spaces, tabs, newlines)
makes sense, for sequences there shouldn't be any white space. I think
defaulting to the gap character is more natural here.
Possible implementation:
def strip(self, chars=None) :
"""Returns a new Seq object with leading and trailing ends stripped.
Optional argument chars defines which characters to remove. If
omitted or None (default) the gap character will be used (if defined
for the alphabet, otherwise defaulting to "-").
In comparison, the string strip method will default to removing
white space."""
if chars is None :
try :
chars = self.alphabet.gap_char
except AttributeError :
chars = "-"
return Seq(str(self).strip(chars), self.alphabet)
def lstrip(self, chars=None) :
"""Returns a new Seq object with leading (left) end stripped.
Optional argument chars defines which characters to remove. If
omitted or None (default) the gap character will be used (if defined
for the alphabet, otherwise defaulting to "-").
In comparison, the string lstrip method will default to removing
white space."""
if chars is None :
try :
chars = self.alphabet.gap_char
except AttributeError :
chars = "-"
return Seq(str(self).lstrip(chars), self.alphabet)
def rstrip(self, chars=None) :
"""Returns a new Seq object with trailing (right) end stripped.
Optional argument chars defines which characters to remove. If
omitted or None (default) the gap character will be used (if defined
for the alphabet, otherwise defaulting to "-").
In comparison, the string rstrip method will default to removing
white space."""
if chars is None :
try :
chars = self.alphabet.gap_char
except AttributeError :
chars = "-"
return Seq(str(self).rstrip(chars), self.alphabet)
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Sep 26 11:52:56 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 26 Sep 2008 11:52:56 -0400
Subject: [Biopython-dev] [Bug 2351] Make Seq more like a string,
even subclass string?
In-Reply-To:
Message-ID: <200809261552.m8QFquJH026306@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2351
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
BugsThisDependsOn| |2596
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From biopython at maubp.freeserve.co.uk Fri Sep 26 12:11:50 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 26 Sep 2008 17:11:50 +0100
Subject: [Biopython-dev] Sequences and simple plots
In-Reply-To:
References: <320fb6e00809250915m42350c70xa51007c50c3c95fe@mail.gmail.com>
<5321F5EB-F2C1-4D1A-9A67-878C695C0945@northwestern.edu>
<320fb6e00809251239i6308d6b9i6a334701ce1cd5f1@mail.gmail.com>
<320fb6e00809260315x62634eadw6b0dd17e074bdeb2@mail.gmail.com>
Message-ID: <320fb6e00809260911rf91432cp8f89904330550d6b@mail.gmail.com>
On Fri, Sep 26, 2008 at 2:40 PM, Jared Flatow wrote:
>
> On Sep 26, 2008, at 5:15 AM, Peter wrote:
>
>> Cut and paste for people to comment on directly,
>
> Ok, cool.
>
>> The first shows a histogram of sequence lengths in a FASTA file (based
>> having recently done this for some real assembly data). Sample output:
>> http://biopython.org/DIST/docs/tutorial/images/hist_plot.png
>>
>> ...
>
> Its a perfectly fine example, my only comment would be to do something like
> this:
>
> seqs = list(SeqIO.parse(handle, 'fasta'))
> hist([len(seq) for seq in seqs], bins=20)
>
> I like to keep the whole sequences in memory, especially if I am just
> digging around the data.
I see what you mean - and maybe that is more realistic.
One change I'd make is avoiding using seq or seqs are variable names
for SeqRecord objects. I've generally tried to use record and records in
the documentation.
i.e. maybe like this:
import pylab
from Bio import SeqIO
records = list(SeqIO.parse(open("ls_orchid.fasta"), "fasta")
#Histogram of lengths
pylab.hist([len(record) for records in records], bins=20)
pylab.title("%i orchid sequences\nLengths %i to %i" \
% (len(sizes),min(sizes),max(sizes)))
pylab.xlabel("Sequence length (bp)")
pylab.ylabel("Count")
pylab.show()
> Also I use the alpha parameter a lot for histograms, especially when
> doing overlapping ones. So then you can also do something like this:
>
> hist([len(seq) for seq in seqs if GC(seq.seq) < .5], bins=20, alpha=.5,
> fc='r')
> hist([len(seq) for seq in seqs if GC(seq.seq) >= .5], bins=20, alpha=.5,
> fc='b')
>
Fun. I didn't want to get into anything too advanced on the pylab side,
rather I wanted to focus on the bioinformatics. Does anyone else think
more advanced graphical demonstrations would be worthwhile?
>> The second is based on the GC% example we used for the BOSC 2008
>> presentation: http://biopython.org/DIST/docs/tutorial/images/gc_plot.png
>>
>> ...
>
> Again, if you had all the sequences in a list:
>
> plot(sorted(GC(seq.seq) for seq in seqs))
I like the use of sorted here, rather than the two step make a list
then sort it.
Peter
From jflatow at northwestern.edu Fri Sep 26 12:23:14 2008
From: jflatow at northwestern.edu (Jared Flatow)
Date: Fri, 26 Sep 2008 11:23:14 -0500
Subject: [Biopython-dev] Sequences and simple plots
In-Reply-To: <320fb6e00809260911rf91432cp8f89904330550d6b@mail.gmail.com>
References: <320fb6e00809250915m42350c70xa51007c50c3c95fe@mail.gmail.com>
<5321F5EB-F2C1-4D1A-9A67-878C695C0945@northwestern.edu>
<320fb6e00809251239i6308d6b9i6a334701ce1cd5f1@mail.gmail.com>
<320fb6e00809260315x62634eadw6b0dd17e074bdeb2@mail.gmail.com>
<320fb6e00809260911rf91432cp8f89904330550d6b@mail.gmail.com>
Message-ID: <0ACA5A64-645F-4D1F-AC93-EB23D983C987@northwestern.edu>
On Sep 26, 2008, at 11:11 AM, Peter wrote:
> I see what you mean - and maybe that is more realistic.
>
> One change I'd make is avoiding using seq or seqs are variable names
> for SeqRecord objects. I've generally tried to use record and
> records in
> the documentation.
Yeah, I agree I was just being lazy.
> i.e. maybe like this:
>
> import pylab
> from Bio import SeqIO
> records = list(SeqIO.parse(open("ls_orchid.fasta"), "fasta")
>
> #Histogram of lengths
> pylab.hist([len(record) for records in records], bins=20)
> pylab.title("%i orchid sequences\nLengths %i to %i" \
> % (len(sizes),min(sizes),max(sizes)))
> pylab.xlabel("Sequence length (bp)")
> pylab.ylabel("Count")
> pylab.show()
Except the title no longer works the same...maybe just:
pylab.title("Distribution of lengths of %i orchid sequences" %
len(records))
?
jared
From biopython at maubp.freeserve.co.uk Fri Sep 26 12:28:30 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 26 Sep 2008 17:28:30 +0100
Subject: [Biopython-dev] Sequences and simple plots
In-Reply-To: <0ACA5A64-645F-4D1F-AC93-EB23D983C987@northwestern.edu>
References: <320fb6e00809250915m42350c70xa51007c50c3c95fe@mail.gmail.com>
<5321F5EB-F2C1-4D1A-9A67-878C695C0945@northwestern.edu>
<320fb6e00809251239i6308d6b9i6a334701ce1cd5f1@mail.gmail.com>
<320fb6e00809260315x62634eadw6b0dd17e074bdeb2@mail.gmail.com>
<320fb6e00809260911rf91432cp8f89904330550d6b@mail.gmail.com>
<0ACA5A64-645F-4D1F-AC93-EB23D983C987@northwestern.edu>
Message-ID: <320fb6e00809260928u4182ee34la768e7fe9f1f7842@mail.gmail.com>
>> i.e. maybe like this:
>>
>> import pylab
>> from Bio import SeqIO
>> records = list(SeqIO.parse(open("ls_orchid.fasta"), "fasta")
>>
>> #Histogram of lengths
>> pylab.hist([len(record) for records in records], bins=20)
>> pylab.title("%i orchid sequences\nLengths %i to %i" \
>> % (len(sizes),min(sizes),max(sizes)))
>> pylab.xlabel("Sequence length (bp)")
>> pylab.ylabel("Count")
>> pylab.show()
>
> Except the title no longer works the same...maybe just:
>
> pylab.title("Distribution of lengths of %i orchid sequences" % len(records))
>
> ?
I spotted that after posting. Whoops. Your suggestion would work,
but I'd rather keep the old full title (partly so I don't have to redo
the PNG file in CVS and on the website).
Did you try the dot-plot example?
Did you have any other ideas for things to plot?
Peter
From bugzilla-daemon at portal.open-bio.org Fri Sep 26 12:59:47 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 26 Sep 2008 12:59:47 -0400
Subject: [Biopython-dev] [Bug 2532] Using IUPAC alphabets in mixed case Seq
objects
In-Reply-To:
Message-ID: <200809261659.m8QGxlhn030037@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2532
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Attachment #972 is|0 |1
obsolete| |
------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2008-09-26 12:59 EST -------
(From update of attachment 972)
I think Martin attached this to the wrong bug, see Bug 2547 instead.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Sep 26 13:06:32 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 26 Sep 2008 13:06:32 -0400
Subject: [Biopython-dev] [Bug 2597] New: Enforce alphabet letters in Seq
objects
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2597
Summary: Enforce alphabet letters in Seq objects
Product: Biopython
Version: Not Applicable
Platform: All
OS/Version: All
Status: NEW
Severity: enhancement
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk
BugsThisDependsOn: 2532
If a Seq object is created with an alphabet with a pre-defined set of letters
(e.g. the IUPAC alphabets) then I think Biopython should validate that the
sequence does indeed only use those letters.
This will catch mis-use of ambiguous sequences with non-ambiguous alphabets,
letters in an unexpected case, and most importantly any unexpected symbols
(e.g. from a parsing problem).
This will impose a performance overhead - which can be avoided if the user
instead chooses to use a generic dna/rna/protein alphabet which does not list
the letters expected.
Note that we will have to resolve Bug 2532 before doing this, as currently some
parts of Biopython are mis-using the upper case only IUPAC alphabet objects
with mixed case sequences.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Sep 26 13:06:34 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 26 Sep 2008 13:06:34 -0400
Subject: [Biopython-dev] [Bug 2532] Using IUPAC alphabets in mixed case Seq
objects
In-Reply-To:
Message-ID: <200809261706.m8QH6YWu030456@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2532
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
OtherBugsDependingO| |2597
nThis| |
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Fri Sep 26 13:13:34 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Fri, 26 Sep 2008 13:13:34 -0400
Subject: [Biopython-dev] [Bug 2532] Using IUPAC alphabets in mixed case Seq
objects
In-Reply-To:
Message-ID: <200809261713.m8QHDYqu030777@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2532
------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2008-09-26 13:13 EST -------
(In reply to comment #0)
> Bio.Nexus and Bio.Sequencing.Phd create Seq objects which use these alphabets
> even with mixed case sequences.
>
> This contradicts how I think the alphabet's .letters property is intended
> to be used (although currently this is not enforced by the Seq object).
I actually identified this issue by making the Seq object check the .letters
property as an experiment. I have now filed this as a separate enhancement,
Bug 2597.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From jflatow at northwestern.edu Fri Sep 26 14:39:49 2008
From: jflatow at northwestern.edu (Jared Flatow)
Date: Fri, 26 Sep 2008 13:39:49 -0500
Subject: [Biopython-dev] Sequences and simple plots
In-Reply-To: <320fb6e00809260928u4182ee34la768e7fe9f1f7842@mail.gmail.com>
References: <320fb6e00809250915m42350c70xa51007c50c3c95fe@mail.gmail.com>
<5321F5EB-F2C1-4D1A-9A67-878C695C0945@northwestern.edu>
<320fb6e00809251239i6308d6b9i6a334701ce1cd5f1@mail.gmail.com>
<320fb6e00809260315x62634eadw6b0dd17e074bdeb2@mail.gmail.com>
<320fb6e00809260911rf91432cp8f89904330550d6b@mail.gmail.com>
<0ACA5A64-645F-4D1F-AC93-EB23D983C987@northwestern.edu>
<320fb6e00809260928u4182ee34la768e7fe9f1f7842@mail.gmail.com>
Message-ID: <52356F04-48AA-454D-A0F6-83E24BBD03EE@northwestern.edu>
On Sep 26, 2008, at 11:28 AM, Peter wrote:
> Did you try the dot-plot example?
I didn't, but it looked good.
> Did you have any other ideas for things to plot?
Nothing that would be too useful, but just for a demonstration of a
scatter plot and putting the different ideas together, it might be
nice to do something like:
plot([len(rec) for rec in records], [GC(rec.seq) for rec in records],
'o')
jared
From biopython at maubp.freeserve.co.uk Fri Sep 26 17:29:27 2008
From: biopython at maubp.freeserve.co.uk (Peter)
Date: Fri, 26 Sep 2008 22:29:27 +0100
Subject: [Biopython-dev] Sequences and simple plots
In-Reply-To: <52356F04-48AA-454D-A0F6-83E24BBD03EE@northwestern.edu>
References: <320fb6e00809250915m42350c70xa51007c50c3c95fe@mail.gmail.com>
<5321F5EB-F2C1-4D1A-9A67-878C695C0945@northwestern.edu>
<320fb6e00809251239i6308d6b9i6a334701ce1cd5f1@mail.gmail.com>
<320fb6e00809260315x62634eadw6b0dd17e074bdeb2@mail.gmail.com>
<320fb6e00809260911rf91432cp8f89904330550d6b@mail.gmail.com>
<0ACA5A64-645F-4D1F-AC93-EB23D983C987@northwestern.edu>
<320fb6e00809260928u4182ee34la768e7fe9f1f7842@mail.gmail.com>
<52356F04-48AA-454D-A0F6-83E24BBD03EE@northwestern.edu>
Message-ID: <320fb6e00809261429i464e0ee8qe81f7090c2141292@mail.gmail.com>
On Fri, Sep 26, 2008 at 7:39 PM, Jared Flatow wrote:
> On Sep 26, 2008, at 11:28 AM, Peter wrote:
>
>> Did you try the dot-plot example?
>
> I didn't, but it looked good.
Hopefully I've pitched it right - I've tried to make it as simple as
possible, but the nested list comprehension is perhaps non-obvious.
>> Did you have any other ideas for things to plot?
>
> Nothing that would be too useful, but just for a demonstration of a scatter
> plot and putting the different ideas together, it might be nice to do
> something like:
>
> plot([len(rec) for rec in records], [GC(rec.seq) for rec in records], 'o')
>
I had wondered about this but I couldn't see an obvious motivation -
plus on the parsing side there is nothing new. How about plotting
melting temperature against sequence length (or against the GC%)? This
would be more interesting as we'd then also get to show the
calculation of another sequence property (using the
Bio.SeqUtils.MeltingTemp module).
Peter
From mjldehoon at yahoo.com Sun Sep 28 07:43:17 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Sun, 28 Sep 2008 04:43:17 -0700 (PDT)
Subject: [Biopython-dev] Numeric / NumPy conversion
Message-ID: <621531.40325.qm@web62406.mail.re1.yahoo.com>
Hi everybody,
Since there were no responses on the mailing list asking to maintain the old Numerical Python alongside the new NumPy, I suggest that we proceed towards a NumPy-only release of Biopython.
--Michiel.
From bugzilla-daemon at portal.open-bio.org Sun Sep 28 20:57:20 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 28 Sep 2008 20:57:20 -0400
Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows
file-path values
In-Reply-To:
Message-ID: <200809290057.m8T0vKw3020416@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2480
------- Comment #21 from drpatnaik at yahoo.com 2008-09-28 20:57 EST -------
I have tried the latest 'NCBIStandalone.py' file, from CVS (version 1.77). The
variable values are as mentioned in comment #16.
I no longer get the error from 'os.path.exists'. However, I still get the
'C:/Documents' is not recognized...' error in the error file.
By adding a print command to the 'NCBIStandalone.py' file, I can see that the
system command being initiated by Python is:
"C:/Documents and Settings/patnaik/My Documents/blast/bin/blastall.exe" -p
blastn -d "\"C:\Documents and Settings\patnaik\My
Documents\blast\bin\hairpin.fa.db\"" -i "C:\Documents and Settings\patnaik\My
Documents\blast\bin\30a.seq" -m 7
This command works as such if run outside Python. If I put it directly inside
the 'os.popen3' call in the 'NCBIStandalone.py' file, I still get the
'C:/Documents' is not recognized...'. Same happens if I run a Python file with
this code:
import os
my_cmd = r'"C:/Documents and Settings/patnaik/My
Documents/blast/bin/blastall.exe" -p blastn -d "\"C:\Documents and
Settings\patnaik\My Documents\blast\bin\hairpin.fa.db\"" -i "C:\Documents and
Settings\patnaik\My Documents\blast\bin\30a.seq" -m 7'
w, r, e = os.popen3(my_cmd)
print e.read()
It seems that using the 'subprocess' module is the only way around this.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Sun Sep 28 21:31:26 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Sun, 28 Sep 2008 21:31:26 -0400
Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows
file-path values
In-Reply-To:
Message-ID: <200809290131.m8T1VQ9X022896@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2480
------- Comment #22 from drpatnaik at yahoo.com 2008-09-28 21:31 EST -------
Following up on comment #21 re: subprocess module:
I am able to get the local BLAST through Biopython to work if I replace the 'w,
r, e = os.popen3(" ".join([blastcmd] + params))' line for 'blastall' in the
'NCBIStandalone.py' file to:
import subprocess
my_process = subprocess.Popen(" ".join([blastcmd] + params))
w, r, e = (my_process.stdin, my_process.stdout, my_process.stderr)
The BLAST results just scroll by on the command-line console application's
screen, so this is very crude. I am new to Python, and I hardly know anything
about 'subprocess'. Perhaps this will information will help the developers.
***
--- C:/Documents and Settings/patnaik/My
Documents/Python252/Lib/Site-packages/Bio/Blast/NCBIStandalone.py ---
[CVS version 1.77 with the chnages outlined above]
--- File C:/Documents and Settings/patnaik/Desktop/test.py ---
# My test file
my_blast_db =r'"\"C:\Documents and Settings\patnaik\My
Documents\blast\bin\mine\""'
my_blast_file =r'"C:\Documents and Settings\patnaik\My
Documents\blast\bin\hairpin"'
my_blast_exe =r'C:/Documents and Settings/patnaik/My
Documents/blast/bin/blastall.exe'
from Bio.Blast import NCBIStandalone
result_handle, error_handle = NCBIStandalone.blastall(my_blast_exe, "blastn",
my_blast_db, my_blast_file)
error_results = error_handle.read()
save_file = open(r"C:/Documents and Settings/patnaik/My
Documents/blast/bin/my_blast_error", "w")
save_file.write(error_results)
save_file.close()
result_results = result_handle.read()
save_file = open(r"C:/Documents and Settings/patnaik/My
Documents/blast/bin/my_blast_result", "w")
save_file.write(result_results)
save_file.close()
--- Run command ---
python "C:\Documents and Settings\patnaik\Desktop\test.py"
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Sep 29 05:12:06 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 29 Sep 2008 05:12:06 -0400
Subject: [Biopython-dev] [Bug 2600] New: enhance Seq and SeqRecord to new
style classes
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2600
Summary: enhance Seq and SeqRecord to new style classes
Product: Biopython
Version: 1.48
Platform: All
OS/Version: All
Status: NEW
Severity: normal
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: jblanca at btc.upv.es
In some situations it would be quite useful to deal with new style classes. I
specially find useful the property method available on the new style classes.
I have run the Test with and without this modification and I've found no
difference at all.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Sep 29 05:13:45 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 29 Sep 2008 05:13:45 -0400
Subject: [Biopython-dev] [Bug 2600] enhance Seq and SeqRecord to new style
classes
In-Reply-To:
Message-ID: <200809290913.m8T9Dj44021461@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2600
------- Comment #1 from jblanca at btc.upv.es 2008-09-29 05:13 EST -------
Created an attachment (id=999)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=999&action=view)
path to transform Seq and SeqRecord into new style classes
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Sep 29 07:30:15 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 29 Sep 2008 07:30:15 -0400
Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows
file-path values
In-Reply-To:
Message-ID: <200809291130.m8TBUFLo029232@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2480
------- Comment #23 from biopython-bugzilla at maubp.freeserve.co.uk 2008-09-29 07:30 EST -------
Just to confirm: Dealing with spaces in filenames on Windows is horrible, isn't
it?
There appear to be problems with os.popen3 with a quoted executable name with
spaces when there are additional arguments (and for BLAST calls there will
always be extra arguments). Switching from os.popen3 to the subprocess module
(python 2.4+ only) might help, but spaces are still tricky here.
I think the best solution is to get rid of the spaces on Windows. In your case
you can't move BLAST, but you can call it via the DOS 8.3 style alternative
filename (which won't have any spaces). You'll have to install Mark Hammond's
win32 extensions from https://sourceforge.net/projects/pywin32/ to do this,
using the win32api.GetShortPathName() function.
Right now I suggest you try this in your own code before calling
Bio.Blast.NCBIStandalone.blastall() to "fix" the exe name, and if needed the
database and input filenames too. Assuming this works nicely, we can put a
note in the documentation.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Sep 29 08:00:03 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 29 Sep 2008 08:00:03 -0400
Subject: [Biopython-dev] [Bug 2596] Add string like split, strip,
rstrip and lstrip methods to the Seq object
In-Reply-To:
Message-ID: <200809291200.m8TC039u030491@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2596
biopython-bugzilla at maubp.freeserve.co.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Summary|Add string like strip, |Add string like split,
|rstrip and lstrip methods to|strip, rstrip and lstrip
|the Seq object |methods to the Seq object
------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-09-29 08:00 EST -------
Adding split onto this bug as discussed on the mailing list. See Bug 2351
comment 15.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Sep 29 08:01:37 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 29 Sep 2008 08:01:37 -0400
Subject: [Biopython-dev] [Bug 2596] Add string like split, strip,
rstrip and lstrip methods to the Seq object
In-Reply-To:
Message-ID: <200809291201.m8TC1boE030672@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2596
------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-09-29 08:01 EST -------
Created an attachment (id=1000)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=1000&action=view)
Patch to Bio/Seq.py for Seq object split, strip, lstrip and rstrip methods
As discussed on the mailing lists, this differs from the previous suggestions
by following the string defaults (split or strip using white space characters).
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Sep 29 08:02:55 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 29 Sep 2008 08:02:55 -0400
Subject: [Biopython-dev] [Bug 2351] Make Seq more like a string,
even subclass string?
In-Reply-To:
Message-ID: <200809291202.m8TC2sJM030759@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2351
------- Comment #16 from biopython-bugzilla at maubp.freeserve.co.uk 2008-09-29 08:02 EST -------
(In reply to comment #15)
> This is a suggested implementation of the split method for our Seq object,
> modelled after the python string method which it calls internall. Note that I
> have made the separator non-optional on the grounds that the string method's
> default of white space isn't (usually) sensible for sequences. I'm happy to
> change this if people this its better to be as close as possible to the string
> method.
>
> def split(self, sep, maxsplit=None) :
> """Split method, like that of a python string.
>
> Return a list of the 'words' in the string (as Seq objects),
> using sep as the delimiter string. If maxsplit is given, at
> most maxsplit splits are done.
>
> Unlike the python string method, sep must be specified (as
> there shouldn't be any whitespace strings in a sequence).
>
> e.g. print my_seq.split("-")
> """
> if maxsplit :
> parts = self.data.split(sep, maxsplit)
> else :
> parts = self.data.split(sep)
> return [Seq(chunk, self.alphabet) for chunk in parts]
>
After some debate on the mailing list, following the python string method
defaults is probably preferable for consistency (even if we don't expect any
white space in a Seq object's sequence).
I have extended Bug 2596 to cover the split method in addition to the strip
methods, and uploaded a revised patch there.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Sep 29 08:35:34 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 29 Sep 2008 08:35:34 -0400
Subject: [Biopython-dev] [Bug 2601] New: Seq find() method: proposal
Message-ID:
http://bugzilla.open-bio.org/show_bug.cgi?id=2601
Summary: Seq find() method: proposal
Product: Biopython
Version: Not Applicable
Platform: All
OS/Version: All
Status: NEW
Severity: enhancement
Priority: P2
Component: BioSQL
AssignedTo: biopython-dev at biopython.org
ReportedBy: lpritc at scri.sari.ac.uk
A find() method for the Seq object was recently proposed on the mailing list.
I have extended Seq locally to include a find method that uses the re module
and the reverse_complement function from Bio.Seq, and is described below. In
the original implementation, the search was meant to be called from the parent
SeqRecord object, which populated itself with features describing the search
results.
I'm proposing this as a potential starting point for the implementation of a
Seq.find() method.
Note that the loop of re.search() calls was necessary to obtain the set of
overlapping matches, as re.finditer() only returns non-overlapping matches.
The two functions searching in forward-only and reverse-only directions could
probably be combined, and behaviour distinguished on keyword, for neater code.
####
def find_regexes(self, pattern):
""" find_regexes(self, pattern)
pattern String, regular expression to search for
Finds all occurrences of the passed regular expression in the
sequence, and returns a list of tuples in the format:
(start, end, match, strand).
If the sequence is a nucleotide sequence, the reverse strand is
also searched
"""
# Find forward matches
match_locations = [(hit.start()+1, hit.end(), \
self.data[hit.start():hit.end()], 1) \
for hit in self.__find_overlapping_regexes(pattern)]
# If the sequence is a nucleotide sequence, look on the reverse
# strand, too
if self.alphabet.__class__ in [Alphabet.DNAAlphabet,
Alphabet.RNAAlphabet,
IUPAC.ExtendedIUPACDNA,
IUPAC.IUPACAmbiguousDNA,
IUPAC.IUPACUnambiguousDNA,
IUPAC.IUPACAmbiguousRNA,
IUPAC.IUPACUnambiguousRNA]:
rev_locations = [(hit.start()+1, hit.end(), \
self.data[hit.start():hit.end()], 1) \
for hit in \
self.__find_overlapping_regexes_rev(pattern)]
match_locations += rev_locations
match_locations.sort()
return match_locations
def __find_overlapping_regexes(self, pattern):
""" Finds all overlapping regexes matching the passed pattern in the
sequence, and returns a list of re.SRE_Match objects describing
them.
"""
hits = []
pos = 0
regex = re.compile(pattern)
while pos < len(self.data):
hit = regex.search(self.data, pos=pos)
if hit is None:
break
hits.append(hit)
pos = hit.start()+1
return hits
def __find_overlapping_regexes_rev(self, pattern):
""" Finds all overlapping regexes matching the passed pattern in the
sequence, and returns a list of re.SRE_Match objects describing
them, as hits positioned in the forward direction - i.e. start and
end read in the forward sense.
"""
hits = []
pos = 0
regex = re.compile(reverse_complement(Seq(pattern, self.alphabet)))
while pos < len(self.data):
hit = regex.search(self.data, pos=pos)
if hit is None:
break
hits.append(hit)
pos = hit.start()+1
return hits
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Sep 29 08:36:09 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 29 Sep 2008 08:36:09 -0400
Subject: [Biopython-dev] [Bug 2601] Seq find() method: proposal
In-Reply-To:
Message-ID: <200809291236.m8TCa9Qk032562@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2601
lpritc at scri.sari.ac.uk changed:
What |Removed |Added
----------------------------------------------------------------------------
Component|BioSQL |Main Distribution
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Sep 29 09:24:13 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 29 Sep 2008 09:24:13 -0400
Subject: [Biopython-dev] [Bug 2601] Seq find() method: proposal
In-Reply-To:
Message-ID: <200809291324.m8TDODXq002611@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2601
------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-09-29 09:24 EST -------
Note that any Seq.find() method should be as like the string find method as
possible for consistency. One enhancement is that it might be worth checking
the search string is valid against the Seq object's alphabet (see also Bug
2597).
However, reserving Seq.find() for this string find like behaviour doesn't stop
us adding more advanced regular expression based methods.
P.S. To determine if a sequence has a nucleotide alphabet, use the fact that
any well defined nucleotide alphabet object should be a subclass of
Bio.Alphabet.NucleotideAlphabet() rather than checking a predefined list.
However, there is no way of knowing if the sequence is double stranded or
single sided, so personally I don't like the way your suggested function
automatically searches the reverse complement strand too.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Sep 29 10:34:03 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 29 Sep 2008 10:34:03 -0400
Subject: [Biopython-dev] [Bug 2601] Seq find() method: proposal
In-Reply-To:
Message-ID: <200809291434.m8TEY3VE007082@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2601
bsouthey at gmail.com changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |bsouthey at gmail.com
------- Comment #2 from bsouthey at gmail.com 2008-09-29 10:34 EST -------
(In reply to comment #1)
I do think that any general function involving regular expressions should
conform to the Python re module. The reasoning follows Peter's point that a
user should not have to convert the Seq object into a Python string. While I
see the point of the reverse complement and overlapping matches, these are
inconsistent with re module. So I think it would be more valuable to implement
specific methods from the re modules. In this case, the functions should accept
regular expression.
I also do not see the gain for the reverse complement because this is just
another pattern. Also it is potentially confusing because the direction is not
immediately apparent without further computation. In this case I think that
'explicit is better than implicit' (The Zen of Python) so I think the decision
to use the reverse complement must come prior to the use of this method.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Sep 29 12:01:51 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 29 Sep 2008 12:01:51 -0400
Subject: [Biopython-dev] [Bug 2475] BioSQL.Loader should reuse existing
taxon entries in lineage
In-Reply-To:
Message-ID: <200809291601.m8TG1ppa013194@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2475
------- Comment #34 from biopython-bugzilla at maubp.freeserve.co.uk 2008-09-29 12:01 EST -------
(In reply to comment #33)
> Also, yes, the _get_taxon_id() function is getting far too long, and should
> probably be restructured as part of this bug.
In BioSQL/Loader.py CVS revision 1.34 I have split the _get_taxon_id() method
in two - ready to look at integrating Eric's code for fetching the NCBI
taxonomy on demand.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Sep 29 12:41:10 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 29 Sep 2008 12:41:10 -0400
Subject: [Biopython-dev] [Bug 2601] Seq find() method: proposal
In-Reply-To:
Message-ID: <200809291641.m8TGfA2F015768@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2601
------- Comment #3 from lpritc at scri.sari.ac.uk 2008-09-29 12:41 EST -------
Make a cup of tea, this is a long one... ;)
Peter:
>Note that any Seq.find() method should be as like the string find method as
>possible for consistency.
Bruce:
> While I
> see the point of the reverse complement and overlapping matches, these are
> inconsistent with re module
I see your points, but I'm not /entirely/ in agreement, here. While I think
that it is nearly always a good thing that the input arguments and returned
results match those that are expected for same-named functions in similar
classes, I think that we may still take the opportunity to implement useful
behaviour that is relevant to biological sequences where the intent doesn't
stray too far from what you'd expect for a string. For example, the ability to
accommodate ambiguous alphabets or regular expressions - not part of
string.find() - would be useful. I think that this approach implements
additional functionality of which the string.find() method's functionality is a
subset, and so could be implemented without breaking the apparent identical
operation of string.find() and Seq.find(). This would facilitate the use of
string-specific third-party modules that could be useful for analysis of
biological sequences, while extending functionality.
Where I begin to disagree is on whether it is always desirable to constrain the
behaviour of these functions for the sake of consistency with other modules,
while still taking time to make them behave differently at all, rather than
just implementing that exact same behaviour, and handling the
biologically-useful stuff in a different method altogether. I like the idea of
making Seq.py more string-like, in part because when I first started using
Biopython, I missed being able to slice, and other conveniently string-y
things.
By way of contrast:
string.find() has the behaviour of only returning a single match - that which
is closest to the string start. This might be useful to some (in ORF-finding,
perhaps), but I expect I would use a finditer() method that returned all
matches (for which there is no equivalent string method) almost exclusively, if
available. I expect that I could cope quite happily with find() doing
different things on pure strings and on Seq objects, but I'd be OK with a
nonstandard finditer() alongside a 100% string-compatible find() as an
alternative to this, though I'd want finditer() to return overlapping matches.
Such overlapping matches, however, do not match re.finditer() behaviour. But,
in this case, the re method's behaviour is constrained for good reasons related
to regular expression implementation, and not reasons related to biological
good sense. I think that there is sufficient reason not to be consistent here,
and instead to return biologically-useful overlapping matches.
The core of my argument here is that we're not just working with strings, but
with string representations of biological objects; that's exactly why we have
this specialised library, and don't just use strings in the first place. I
think that there will be occasions when we should break some syntactic
expectations, where it is appropriate for the problem domain, and that this
*might* (note equivocation) be one of them.
Peter:
>One enhancement is that it might be worth checking
>the search string is valid against the Seq object's alphabet (see also Bug
>2597).
Good point. In the implementation I put up here, if there are any invalid
characters then the string just won't be found, which may be overgenerous to
user error ;) Raising a ValueError or some such to let the user know that the
search alphabet wasn't valid would be very helpful.
Peter:
> To determine if a sequence has a nucleotide alphabet, use the fact that
> any well defined nucleotide alphabet object should be a subclass of
> Bio.Alphabet.NucleotideAlphabet() rather than checking a predefined list.
Fair enough - I didn't know that NucleotideAlphabet existed... I got as far up
the hierarchy as DNAAlphabet and RNAAlphabet, and stopped at working code ;)
Peter:
> However, there is no way of knowing if the sequence is double stranded or
> single sided, so personally I don't like the way your suggested function
> automatically searches the reverse complement strand too.
It just suited my purpose at the time. Whether or not the nucleotide sequence
is single- or double-stranded, people might still want to search for a
complementary sequence; e.g. microarray/PCR/siRNA probes, etc. The method as
written reports the strand on which the match can be found, and the user is
free to discard results as they see fit, which again suited me at the time. A
'strand' argument to the method of 'forward', 'reverse', or 'both', or just
assuming 'both' if not specified would be better, I agree.
What drove my implementation above was that, while nucleotide sequence matches
may or may not be of interest in either direction, reverse matches to protein
sequences are definitely (AFAIAC) not that interesting ;)
Bruce:
>I do think that any general function involving regular expressions should
> conform to the Python re module. The reasoning follows Peter's point that a
> user should not have to convert the Seq object into a Python string.
I don't think I understand this point. Would you prefer an re.search() like
implementation that takes a Seq object as its query argument? I don't think
I'd find that as useful, myself, as a method that just takes a string. Such a
method could also maybe parse arguments so as to compile the regex from the
Seq.data attribute though, fulfilling your requirement.
I used regular expression based searching in my implementation for speed, and
strictly speaking a string is also a regular expression, even if it doesn't
have special characters - I didn't see any inconsistency there. My docstring
is maybe a bit misleading about that but, when I wrote it, it wasn't intended
for anyone but me to use. Sorry about that.
Also, I disagree regarding conformance to the re module, particularly as our
use of re is likely to be less general than the re module itself - see above.
> So I think it would be more valuable to implement
> specific methods from the re modules. In this case, the functions should accept
> regular expression.
I would quite like to have a 'true' regular expression search method myself,
with wildcards for nucleotide symbols, but this would have to be implemented
differently to my attempt above: e.g., for proper reverse complement searches,
you'd have to reverse complement the wildcards as well as ambiguity codes.
> I also do not see the gain for the reverse complement because this is just
> another pattern.
The gain was that I needed matches to my patterns of interest on the sequence
in either direction, and I only cared which strand they lay on for reasons of
locating them. Reverse complementing the query is usually quicker than reverse
complementing the genome on which you search. Assuming you're searching on a
genome, of course ;)
> Also it is potentially confusing because the direction is not
> immediately apparent without further computation.
I'm not sure I understand you: in teh above code, the method returns the strand
on which the match is found, along with all the other data. The computation
required to handle this is the same as that to find the start and end points:
parse an integer from the tuple. I'm not intending that the return type should
be set in stone and, as I mentioned, it was just a handy step in the creation
of SeqFeatures in the parent SeqRecord.
> In this case I think that
> 'explicit is better than implicit' (The Zen of Python) so I think the decision
> to use the reverse complement must come prior to the use of this method.
In the spirit of quoted arguments from authority: "A foolish consistency is the
hobgoblin of little minds" (Python Style Guide) ;)
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Mon Sep 29 16:47:50 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Mon, 29 Sep 2008 16:47:50 -0400
Subject: [Biopython-dev] [Bug 2601] Seq find() method: proposal
In-Reply-To:
Message-ID: <200809292047.m8TKlokn000682@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2601
------- Comment #4 from bsouthey at gmail.com 2008-09-29 16:47 EST -------
(In reply to comment #3)
> Make a cup of tea, this is a long one... ;)
>
> Peter:
> >Note that any Seq.find() method should be as like the string find method as
> >possible for consistency.
> Bruce:
> > While I
> > see the point of the reverse complement and overlapping matches, these are
> > inconsistent with re module
>
> I see your points, but I'm not /entirely/ in agreement, here.
Good, as where is the fun otherwise? :-)
> that it is nearly always a good thing that the input arguments and returned
> results match those that are expected for same-named functions in similar
> classes, I think that we may still take the opportunity to implement useful
> behaviour that is relevant to biological sequences where the intent doesn't
> stray too far from what you'd expect for a string. For example, the ability to
> accommodate ambiguous alphabets or regular expressions - not part of
> string.find() - would be useful. I think that this approach implements
> additional functionality of which the string.find() method's functionality is a
> subset, and so could be implemented without breaking the apparent identical
> operation of string.find() and Seq.find(). This would facilitate the use of
> string-specific third-party modules that could be useful for analysis of
> biological sequences, while extending functionality.
>
> Where I begin to disagree is on whether it is always desirable to constrain the
> behaviour of these functions for the sake of consistency with other modules,
> while still taking time to make them behave differently at all, rather than
> just implementing that exact same behaviour, and handling the
> biologically-useful stuff in a different method altogether. I like the idea of
> making Seq.py more string-like, in part because when I first started using
> Biopython, I missed being able to slice, and other conveniently string-y
> things.
Okay, so what is still missing with these new changes?
>
> By way of contrast:
> string.find() has the behaviour of only returning a single match - that which
> is closest to the string start. This might be useful to some (in ORF-finding,
> perhaps), but I expect I would use a finditer() method that returned all
> matches (for which there is no equivalent string method) almost exclusively, if
> available. I expect that I could cope quite happily with find() doing
> different things on pure strings and on Seq objects, but I'd be OK with a
> nonstandard finditer() alongside a 100% string-compatible find() as an
> alternative to this, though I'd want finditer() to return overlapping matches.
>
It is not correct to compare finditer (a re method) to find (a string method)
or for that matter re.match or re.search. (I do notice a confusion between
these similar but different functions but there are numerous web pages that
discuss when one or the other should be used.) I do understand the interest but
there two different points that you raised in this bug. First is finding one
match (such as re.search or re.match) and finding all matches (such as
re.findall or re.finditer). I fully agree with having these. This is the second
point that I definitely think that the user has to decide whether or not they
want overlapping matches not the developer. There is no option under this
implementation.
> Such overlapping matches, however, do not match re.finditer() behaviour. But,
> in this case, the re method's behaviour is constrained for good reasons related
> to regular expression implementation, and not reasons related to biological
> good sense. I think that there is sufficient reason not to be consistent here,
> and instead to return biologically-useful overlapping matches.
I am not for or against having an method that returns overlapping matches
rather I am against only having returning overlapping matches as the only
choice.
>
> The core of my argument here is that we're not just working with strings, but
> with string representations of biological objects; that's exactly why we have
> this specialised library, and don't just use strings in the first place. I
> think that there will be occasions when we should break some syntactic
> expectations, where it is appropriate for the problem domain, and that this
> *might* (note equivocation) be one of them.
>
> Peter:
> >One enhancement is that it might be worth checking
> >the search string is valid against the Seq object's alphabet (see also Bug
> >2597).
>
> Good point. In the implementation I put up here, if there are any invalid
> characters then the string just won't be found, which may be overgenerous to
> user error ;) Raising a ValueError or some such to let the user know that the
> search alphabet wasn't valid would be very helpful.
>
> Peter:
> > To determine if a sequence has a nucleotide alphabet, use the fact that
> > any well defined nucleotide alphabet object should be a subclass of
> > Bio.Alphabet.NucleotideAlphabet() rather than checking a predefined list.
>
> Fair enough - I didn't know that NucleotideAlphabet existed... I got as far up
> the hierarchy as DNAAlphabet and RNAAlphabet, and stopped at working code ;)
>
> Peter:
> > However, there is no way of knowing if the sequence is double stranded or
> > single sided, so personally I don't like the way your suggested function
> > automatically searches the reverse complement strand too.
>
> It just suited my purpose at the time. Whether or not the nucleotide sequence
> is single- or double-stranded, people might still want to search for a
> complementary sequence; e.g. microarray/PCR/siRNA probes, etc. The method as
> written reports the strand on which the match can be found, and the user is
> free to discard results as they see fit, which again suited me at the time. A
> 'strand' argument to the method of 'forward', 'reverse', or 'both', or just
> assuming 'both' if not specified would be better, I agree.
>
> What drove my implementation above was that, while nucleotide sequence matches
> may or may not be of interest in either direction, reverse matches to protein
> sequences are definitely (AFAIAC) not that interesting ;)
>
> Bruce:
> >I do think that any general function involving regular expressions should
> > conform to the Python re module. The reasoning follows Peter's point that a
> > user should not have to convert the Seq object into a Python string.
>
> I don't think I understand this point. Would you prefer an re.search() like
> implementation that takes a Seq object as its query argument? I don't think
> I'd find that as useful, myself, as a method that just takes a string. Such a
> method could also maybe parse arguments so as to compile the regex from the
> Seq.data attribute though, fulfilling your requirement.
What I mean is that a user should be able to either specify the pattern or
specify a regular expression object. In either case the optional flags that are
often useful to have like ignorecase are ignored.
>
> I used regular expression based searching in my implementation for speed, and
> strictly speaking a string is also a regular expression, even if it doesn't
> have special characters - I didn't see any inconsistency there. My docstring
> is maybe a bit misleading about that but, when I wrote it, it wasn't intended
> for anyone but me to use. Sorry about that.
>
> Also, I disagree regarding conformance to the re module, particularly as our
> use of re is likely to be less general than the re module itself - see above.
>
> > So I think it would be more valuable to implement
> > specific methods from the re modules. In this case, the functions should accept
> > regular expression.
>
> I would quite like to have a 'true' regular expression search method myself,
> with wildcards for nucleotide symbols, but this would have to be implemented
> differently to my attempt above: e.g., for proper reverse complement searches,
> you'd have to reverse complement the wildcards as well as ambiguity codes.
>
> > I also do not see the gain for the reverse complement because this is just
> > another pattern.
>
> The gain was that I needed matches to my patterns of interest on the sequence
> in either direction, and I only cared which strand they lay on for reasons of
> locating them. Reverse complementing the query is usually quicker than reverse
> complementing the genome on which you search. Assuming you're searching on a
> genome, of course ;)
>
> > Also it is potentially confusing because the direction is not
> > immediately apparent without further computation.
>
> I'm not sure I understand you: in teh above code, the method returns the strand
> on which the match is found, along with all the other data. The computation
> required to handle this is the same as that to find the start and end points:
> parse an integer from the tuple. I'm not intending that the return type should
> be set in stone and, as I mentioned, it was just a handy step in the creation
> of SeqFeatures in the parent SeqRecord.
Regardless of what a user actually wants, they must wait for two searches along
the sequence. After that finishes the user must examine each and every entry
(due to the match_locations.sort()) to find the strand regardless of what they
want to do. I do not any advantage in this than someone calling the function
twice to get match_locations and rev_locations, doing 'match_locations +=
rev_locations' and match_locations.sort().
>
> > In this case I think that
> > 'explicit is better than implicit' (The Zen of Python) so I think the decision
> > to use the reverse complement must come prior to the use of this method.
>
> In the spirit of quoted arguments from authority: "A foolish consistency is the
> hobgoblin of little minds" (Python Style Guide) ;)
>
Okay, then more Zen:
"In the face of ambiguity, refuse the temptation to guess."
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Sep 30 01:53:02 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 30 Sep 2008 01:53:02 -0400
Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows
file-path values
In-Reply-To:
Message-ID: <200809300553.m8U5r2OB000429@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2480
------- Comment #24 from drpatnaik at yahoo.com 2008-09-30 01:53 EST -------
[I myself do not need to work on a Windows machine, and I am following this bug
out of curiosity.]
I have tried the pywin32-short-path-name approach.
It seems I have to use win32api.GetShortPathName on the values for the BLAST
exe as well as the database and the input file. If I don't do so, the process
seems to hang: I see only a blinking cursor for 2-3 minutes when I just quit
the console application (I know the BLAST process itself takes less than a
second to finish). The error file has '^C' and the result file is empty.
But because the specified database ('mine') is really not a file,
GetShortPathName fails, unless I use code like:
my_blast_db = win32api.GetShortPathName('C:/Documents and Settings/patnaik/My
Documents/blast/bin/mine.nin')[:-4]
Even then I see the hang, and get a similar as before [or empty] error file.
With a print command in NCBIStandalone.py, I can see that the value being
passed on to the system is:
C:/DOCUME~1/patnaik/MYDOCU~1/blast/bin/blastall.exe -p blastn -d
C:/DOCUME~1/patnaik/MYDOCU~1/blast/bin/HAIRPI~1 -i
C:/DOCUME~1/patnaik/MYDOCU~1/blast/bin/30a.seq -m 7
This value is right as it works by itself.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Sep 30 03:45:14 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 30 Sep 2008 03:45:14 -0400
Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows
file-path values
In-Reply-To:
Message-ID: <200809300745.m8U7jEQ3008574@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2480
------- Comment #25 from drpatnaik at yahoo.com 2008-09-30 03:45 EST -------
Using subprocess I am now able to get Biopython to run local BLAST successfully
in Windows when spaces are present in file-path values.
With a conditional statement like the one below, following type of modification
will still let Biopython remain compatible with old versions of Python that
cannot use subprocess:
# replace lines 1680-1682 of CVS 1.78 of Bio/Blast/NCBIStandalone.py with
these
try:
import subprocess
my_process = subprocess.Popen(" ".join([blastcmd] + params),
stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE,
shell=False)
r, e = my_process.communicate('through stdin to stdout')
return r, e
except:
w, r, e = os.popen3(" ".join([blastcmd] + params))
w.close()
return File.UndoHandle(r), File.UndoHandle(e)
The test.py file I tested:
# Note the unusual ways to specify the database, input file, and BLAST
locations
my_blast_db = r'"\"C:\Documents and Settings\patnaik\My
Documents\blast\bin\hairpin.db\""'
my_blast_file = r'"C:\Documents and Settings\patnaik\My
Documents\blast\bin\30a.seq"'
my_blast_exe = r"C:\Documents and Settings\patnaik\My
Documents\blast\bin\blastall.exe"
from Bio.Blast import NCBIStandalone
my_blast_result, my_blast_error = NCBIStandalone.blastall(my_blast_exe,
"blastn", my_blast_db, my_blast_file)
# Note the way the save_file path is specified
save_file = open(r'C:/Documents and Settings/patnaik/My
Documents/blast/bin/my_blast_error', "w")
save_file.write(my_blast_error)
save_file.close()
save_file = open(r'C:/Documents and Settings/patnaik/My
Documents/blast/bin/my_blast_result', "w")
save_file.write(my_blast_result)
save_file.close()
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Sep 30 08:49:45 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 30 Sep 2008 08:49:45 -0400
Subject: [Biopython-dev] [Bug 2551] Adding advanced __getitem__ to generic
alignment, e.g. align[1:2, 5:-5]
In-Reply-To:
Message-ID: <200809301249.m8UCnjXI001499@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2551
------- Comment #1 from jblanca at btc.upv.es 2008-09-30 08:49 EST -------
Created an attachment (id=1001)
--> (http://bugzilla.open-bio.org/attachment.cgi?id=1001&action=view)
new Alignment implementation example
Implementation of an Aligment-like class (here named Assembly) capable of
covering the cases proposed by Peter and also capable of holding sequences that
does not start and end at the same location.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Sep 30 08:55:26 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 30 Sep 2008 08:55:26 -0400
Subject: [Biopython-dev] [Bug 2551] Adding advanced __getitem__ to generic
alignment, e.g. align[1:2, 5:-5]
In-Reply-To:
Message-ID: <200809301255.m8UCtQ9R001967@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2551
------- Comment #2 from jblanca at btc.upv.es 2008-09-30 08:55 EST -------
In the comment #1 I present a class that could be easily adapted to be
compatible with the BioPython Alignment API but with some extra capabilities.
It can hold sequences that start and end at different places (like the EST
assemblies).
It also can have a consensus, although that's a minor improvement.
And it can hold in the rows any sequence-like class like Seq, str, list or
tuple. This would be, I hope, quite future proof, we could also add Quality,
SeqWithQuality or whatever.
It doesn't deal with the alphabet, I think that a subclass should be created to
add that capability. I haven't add alphabets to this class to keep the
compatibility with all the sequence-like objects that have no alphabet.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Sep 30 10:32:45 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 30 Sep 2008 10:32:45 -0400
Subject: [Biopython-dev] [Bug 2475] BioSQL.Loader should reuse existing
taxon entries in lineage
In-Reply-To:
Message-ID: <200809301432.m8UEWjtq009707@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2475
------- Comment #35 from biopython-bugzilla at maubp.freeserve.co.uk 2008-09-30 10:32 EST -------
BioSQL/BioSeqDatabase.py revision 1.18 and BioSQL/Loader.py revision 1.35 in
CVS include what I think is a working version of the BioSQL loader which can
fetch taxonomy from the NCBI via Bio.Entrez. This is based in part on Eric's
code but includes several additional features (e.g. recording the genetic code
which the NCBI provides with the taxonomy data).
When the NCBI fetching is disabled, but an NCBI taxon ID is known, only a
minimal taxonomy record is recorded (without the lineage). This can then be
completed by running the BioSQL load_ncbi_taxonomy.pl script.
There is still scope for improvement, e.g.
* _get_taxon_id_from_ncbi_lineage doesn't really need to be recursive.
* When there is no NCBI taxon ID present in the SeqRecord this code will not
attempt to search for the taxonomy based on the species name. I'm not sure if
doing this search is a good idea or not...
* We could make an Entrez.efetch call for each row added to the table (rather
than as currently just one call per lineage) which should allow us to fetch the
genetic code for all the entries. On balance I think this is not needed, and
can be populated by the BioSQL load_ncbi_taxonomy.pl script anyway.
This has passed the unit tests and my own initial testing, and I intend to use
this code a lot more this week/next week. However, it would be great to have
some additional testing of this as is.
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Sep 30 11:34:37 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 30 Sep 2008 11:34:37 -0400
Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows
file-path values
In-Reply-To:
Message-ID: <200809301534.m8UFYbEP014418@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2480
------- Comment #26 from biopython-bugzilla at maubp.freeserve.co.uk 2008-09-30 11:34 EST -------
(In reply to comment #25)
> Using subprocess I am now able to get Biopython to run local
> BLAST successfully in Windows when spaces are present in
> file-path values.
Good - but have you been able to try your code on Linux or the Mac?
> With a conditional statement like the one below, following
> type of modification will still let Biopython remain
> compatible with old versions of Python that cannot use
> subprocess:
If we can take advantage of the subprocess module in a cross platform way, then
yes, a try/except fall back for python 2.3 would be nice. As of
Blast/NCBIStandalone.py CVS revision 1.79, there is now only one place in this
module where such a change can be applied (rather than three places).
> # replace lines 1680-1682 of CVS 1.78 of Bio/Blast/NCBIStandalone.py with
> these
>
> try:
> import subprocess
> my_process = subprocess.Popen(" ".join([blastcmd] + params),
> stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE,
> shell=False)
Using shell=False works while shell=True fails on Windows (I tested on Windows
XP with Python 2.5 from IDLE). However, the opposite is true on Mac OS X with
python 2.5 from IDLE. This is a pain.
Also you don't need the stdin=... argument as we don't want to give BLAST any
piped input.
> r, e = my_process.communicate('through stdin to stdout')
> return r, e
First of all, there is no reason to pipe in the text "through stdin to stdout"
into BLAST's standard input. I guess you blindly cut and paste this from a
google search. Instead just:
r, e = my_process.communicate()
You should NOT be using the communicate method, as it will read in an buffer
all the output and wait for BLAST to finish. As BLAST output (especially XML
output) can be larger (gigabytes) we must not load this into memory. Instead:
r = my_process.stdout
e = my_process.stderr
> except:
> w, r, e = os.popen3(" ".join([blastcmd] + params))
> w.close()
> return File.UndoHandle(r), File.UndoHandle(e)
>
> The test.py file I tested:
>
> # Note the unusual ways to specify the database, input file, and BLAST
> locations
> my_blast_db = r'"\"C:\Documents and Settings\patnaik\My
> Documents\blast\bin\hairpin.db\""'
> my_blast_file = r'"C:\Documents and Settings\patnaik\My
> Documents\blast\bin\30a.seq"'
> my_blast_exe = r"C:\Documents and Settings\patnaik\My
> Documents\blast\bin\blastall.exe"
I've been having trouble with specifying BLAST databases with spaces in the
path. Have you been able to demonstrate this with more than one database?
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From bugzilla-daemon at portal.open-bio.org Tue Sep 30 18:19:22 2008
From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org)
Date: Tue, 30 Sep 2008 18:19:22 -0400
Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows
file-path values
In-Reply-To:
Message-ID: <200809302219.m8UMJMa8016035@portal.open-bio.org>
http://bugzilla.open-bio.org/show_bug.cgi?id=2480
------- Comment #27 from drpatnaik at yahoo.com 2008-09-30 18:19 EST -------
As I mentioned earlier, I am new to Python, and my usage of the subprocess is
indeed imperfect.
I tried the subprocess routine through a test.py file on a Mac OS X 10.5.5 with
Python 2.5.2, but w/o using Biopython. I had to use 'shell=True', otherwise
with 'shell=False',I get:
File "/Lab/Laboratory/Libs/Python/lib/python2.5/subprocess.py", line 594, in
__init__
errread, errwrite)
File "/Lab/Laboratory/Libs/Python/lib/python2.5/subprocess.py", line 1091, in
_execute_child
raise child_exception
With 'shell=True', it works even when there is a space in the file-path/names
of the BLAST executable, the database or the input sequence file (the escaping
of the spaces needs to be properly done).
--- test.py ---
# Note escaping of the space characters vary for
my_blast_cmd = r'"/Lab/Laboratory/Libs/NCBI_blast/bin/Change loc/blastall" -p
blastn -d "\"/Lab/Laboratory/Libs/NCBI_blast/data/Change loc/My db\"" -i
/Lab/Laboratory/Libs/NCBI_blast/data/My\ seq.txt -m 7'
import subprocess
my_process = subprocess.Popen(my_blast_cmd, stdin=subprocess.PIPE,
stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True)
my_blast_result, my_blast_error = my_process.communicate('through stdin to
stdout')
save_file = open('/Users/patnaik/Desktop/my_blast_error', "w")
save_file.write(my_blast_error)
save_file.close()
save_file = open('/Users/patnaik/Desktop/my_blast_result', "w")
save_file.write(my_blast_result)
save_file.close()
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
From mjldehoon at yahoo.com Tue Sep 30 20:24:18 2008
From: mjldehoon at yahoo.com (Michiel de Hoon)
Date: Tue, 30 Sep 2008 17:24:18 -0700 (PDT)
Subject: [Biopython-dev] Numpy conversion
In-Reply-To: <37659.57326.qm@web62402.mail.re1.yahoo.com>
Message-ID: <228132.43778.qm@web62402.mail.re1.yahoo.com>
Bio.kNN is the only module that imports Bio.distance. Bio.distance is written in Python, but it also imports a C version of Bio.distance if it is available. From the comments in the code, I gather that the purpose of the C-version is to get fast distance calculations without using Numeric / NumPy. However, Bio.kNN itself uses Numeric / NumPy, which defeats the purpose of the C-version of Bio.distance.
I would therefore like to propose to add a NumPy-aware version of the code in Bio.distance to Bio.kNN, and to deprecate Bio.distance.
Any objections?
--Michiel.
--- On Thu, 9/18/08, Michiel de Hoon wrote:
> From: Michiel de Hoon
> Subject: Re: [Biopython-dev] Numpy conversion
> To: "Peter"
> Cc: biopython-dev at biopython.org
> Date: Thursday, September 18, 2008, 10:10 AM
> > I've not used it myself, but it sounds handy.
> Michiel,
> > does this overlap at all with your clustering module?
>
> No, it doesn't. Bio.Cluster contains unsupervised
> clustering methods only. The k-nearest neighbors in Bio.kNN
> is a supervised learning method.
>
> --Michiel.
>
> --- On Wed, 9/17/08, Peter
> wrote:
>
> > From: Peter
> > Subject: Re: [Biopython-dev] Numpy conversion
> > To: mjldehoon at yahoo.com
> > Cc: biopython-dev at biopython.org
> > Date: Wednesday, September 17, 2008, 10:29 AM
> > On Wed, Sep 17, 2008 at 3:13 PM, Michiel de Hoon
> > wrote:
> > > Hi everybody,
> > >
> > > I am now looking at the pure-python modules that
> make
> > use of Numerical Python / NumPy.
> > > Bio.kNN is one of them; this also happens to be
> the
> > only module that imports Bio.distance,
> > > which also depends on NumPy.
> > >
> > > What I am not sure about is the usage of Bio.kNN.
> A
> > quick google search didn't reveal much,
> > > suggesting that it is not widely used. Bio.kNN
> > currently is not documented in the tutorial, but
> > > the code itself is reasonably well documented.
> > >
> > > How do you guys feel about this module? Should we
> keep
> > it?
> > >
> >
> > I've not used it myself, but it sounds handy.
> Michiel,
> > does this
> > overlap at all with your clustering module?
> >
> > Peter
>
>
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev