From bugzilla-daemon at portal.open-bio.org Tue Sep 2 09:06:51 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 2 Sep 2008 09:06:51 -0400 Subject: [Biopython-dev] [Bug 2543] Bio.Nexus.Trees can't handle named ancestors In-Reply-To: Message-ID: <200809021306.m82D6p9i021009@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2543 ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-09-02 09:06 EST ------- Hi Frank, Did you get a chance to look at that code for named ancestors? Thanks Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Sep 2 10:05:17 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 2 Sep 2008 10:05:17 -0400 Subject: [Biopython-dev] [Bug 2543] Bio.Nexus.Trees can't handle named ancestors In-Reply-To: Message-ID: <200809021405.m82E5HRM025041@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2543 ------- Comment #4 from cymon.cox at gmail.com 2008-09-02 10:05 EST ------- Hi Peter, > Can I ask if you've actually come across trees with names ancestor nodes in > "real life"? That would make this bug more important. If so, the name of the > tool would be interesting, P4 (http://code.google.com/p/p4-phylogenetics/) is the only one I'm aware of. As Frank implies the labels at nodes arent necessarily names of ancestors but rather are just labels that can be any text. In P4 they are they are just an string attribute of the node object. P4 uses them primarily to aid tree drawing. Support indices in phylogenetics are properties of branches and this is fine in a unrooted tree context. But most systematists want to orientate the tree, ie. root it informally, and refer to a particular node having the support value of its subtending branch. Its therefore useful to transfer the branch support values to node labels before drawing the tree. > an example tree file would be great to add to > Biopython as a test case. How about this: +--------------3:t9 +------2:B | | +----------------5:t8 | +-----4:C +-----1:A +---------------6:t4 | | | +---------------7:t6 | |------------------8:t2 0 | +------------11:t0 | +-----10:E | | +-----------------12:t7 | | +-----9:D +------15:t5 | +------14:G +-----13:F +-----------------16:t3 | +--------17:t1 """ #NEXUS begin taxa; dimensions ntax=10; taxlabels t0 t1 t2 t3 t4 t5 t6 t7 t8 t9; end; begin trees; tree random = [&U] (((t9:0.385832, (t8:0.445135, t4:0.41401)C:0.024032)B:0.041436, t6:0.392496)A:0.0291131, t2:0.497673, ((t0:0.301171, t7:0.482152)E:0.0268148, ((t5:0.0984167, t3:0.488578)G:0.0349662, t1:0.130208)F:0.0318288)D:0.0273876); end; """ (Hi Frank) Cheers, Cymon. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Sep 2 11:44:11 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 2 Sep 2008 11:44:11 -0400 Subject: [Biopython-dev] [Bug 2543] Bio.Nexus.Trees can't handle named ancestors In-Reply-To: Message-ID: <200809021544.m82FiBej030561@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2543 ------- Comment #5 from fkauff at biologie.uni-kl.de 2008-09-02 11:44 EST ------- (In reply to comment #3) Hi Peter, haven't done anything yet. The previously mentioned code works different (assigning values to nodes within a [&...] comment), rather than names to nodes. Assigning names to nodes can be very useful, but as Cymon mention, P4 seems to be the only program that can handle them. In my opinion, naming nodes is a feature, and I would not regard the lack of this feature as a bug. But I'll have a look at the code and see how easy this can be changed. It would actually be nice if P4 and Bio.Nexus, both being python programs, could read each other's trees. (Hi Cymon :-) ) Frank > Hi Frank, > > Did you get a chance to look at that code for named ancestors? > > Thanks > > Peter > -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Tue Sep 2 11:49:09 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 2 Sep 2008 16:49:09 +0100 Subject: [Biopython-dev] Preparing for Biopython 1.48 Message-ID: <320fb6e00809020849i7c480997g810ffbc523143f47@mail.gmail.com> Dear all, Is there anything we need to address before starting to prepare Biopython 1.48? This is likely to be the final Numeric only release of Biopython, with the following release hopefully supporting both Numeric and numpy in some way. (As an aside, once we support numpy, having scipy as an optional dependency for the statistics Tiago wanted to use in Bio.PopGen does seem less onerous.) Regarding potentially deprecating Bio.Mindy and Martel, I propose we do this in the following release (i.e. after Biopython 1.48). We can then drop the mxTextTools dependency completely. While I would like to address some of the enhancements (esp Bug 2530) these can wait. Ignoring the enhancements, there are several "small" issues on bugzilla that could be dealt with, but nothing that I think warrants delaying the release. One question: Currently Bio.SeqIO in CVS has partial support for writing GenBank files (basically the sequence and minimal annotation - no references, no features). I don't want to rush something out without proper testing, so do people think it would be better to ship with this partial support, or temporarily disable it (a one line change in Bio/SeqIO/__init__.py to the _FormatToWriter dictionary, and probably refreshing the expected unit test output). Comments and suggestions welcome! Thanks, Peter From tiagoantao at gmail.com Tue Sep 2 14:25:31 2008 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Tue, 2 Sep 2008 19:25:31 +0100 Subject: [Biopython-dev] Preparing for Biopython 1.48 In-Reply-To: <320fb6e00809020849i7c480997g810ffbc523143f47@mail.gmail.com> References: <320fb6e00809020849i7c480997g810ffbc523143f47@mail.gmail.com> Message-ID: <6d941f120809021125u7188d67l2612fd0f09277abc@mail.gmail.com> Hi All, On Tue, Sep 2, 2008 at 4:49 PM, Peter wrote: > (As an aside, once we support numpy, having scipy as an optional > dependency for the statistics Tiago wanted to use in Bio.PopGen does > seem less onerous.) > First, my apologies for not reporting back from BOSC,but I was in a conference/professional visit spree for the last 3 months, returned last most. Basically it was not OK: I arrived there from a previous conference and did the presentation without little sleep, it was probably the sloppiest presentation in my whole life. My sincere apologies. On a better front, I have a lot of new content for Bio.PopGen, a few remarks: 1. No documentation and testing done, so I will skip adding content to 1.48. But I will surely add to 1.49. 2. None of the new content relies on scipy (as there was no agreement on that), but being able to use scipy would make things much easier. Most of anything that can be called "population genetics" is nothing more than statistics (statistics were invented because of population genetics). So a change in policy would be welcomed (and would make Bio.PopGen really useful for a wide audience - currently it has only niche users). In another front, we published a paper using content from Bio.PopGen 1.44 http://www.biomedcentral.com/1471-2105/9/323 Regards, Tiago From biopython at maubp.freeserve.co.uk Tue Sep 2 15:05:31 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 2 Sep 2008 20:05:31 +0100 Subject: [Biopython-dev] Preparing for Biopython 1.48 In-Reply-To: References: <320fb6e00809020849i7c480997g810ffbc523143f47@mail.gmail.com> <6d941f120809021125u7188d67l2612fd0f09277abc@mail.gmail.com> Message-ID: <320fb6e00809021205v4ec1a8f3wa3997881ad1e7d07@mail.gmail.com> Tiago wrote: >> First, my apologies for not reporting back from BOSC,but I was in a >> conference/professional visit spree for the last 3 months, returned last >> most. Basically it was not OK: I arrived there from a previous conference >> and did the presentation without little sleep, it was probably the >> sloppiest presentation in my whole life. My sincere apologies. I hardly dared ask how you felt at the end of your almost round the world trip ;) Jared wrote: > Despite Tiago's self-criticism I thought his BOSC presentation was fine and > up to par with the rest of them. > > jared That sounds much more positive :) This reminds me that I could/should make a PDF version of the BOSC 2008 slides to go online here: http://biopython.org/wiki/Documentation#Presentations >> On a better front, I have a lot of new content for Bio.PopGen, a few >> remarks: >> 1. No documentation and testing done, so I will skip adding content to >> 1.48. But I will surely add to 1.49. That sounds sensible, and another reason to get Biopython 1.48 out soon. Depending how my day goes tomorrow, I could try then. >> 2. None of the new content relies on scipy (as there was no agreement on >> that), but being able to use scipy would make things much easier. Most of >> anything that can be called "population genetics" is nothing more than >> statistics (statistics were invented because of population genetics). So a >> change in policy would be welcomed (and would make Bio.PopGen really >> useful for a wide audience - currently it has only niche users). Let's get the move from Numeric to NumPy done after Biopython 1.48, and re-open the possible SciPy dependency question then. >> In another front, we published a paper using content from Bio.PopGen 1.44 >> http://www.biomedcentral.com/1471-2105/9/323 Excellent, Peter From jflatow at northwestern.edu Tue Sep 2 14:42:52 2008 From: jflatow at northwestern.edu (Jared Flatow) Date: Tue, 2 Sep 2008 13:42:52 -0500 Subject: [Biopython-dev] Preparing for Biopython 1.48 In-Reply-To: <6d941f120809021125u7188d67l2612fd0f09277abc@mail.gmail.com> References: <320fb6e00809020849i7c480997g810ffbc523143f47@mail.gmail.com> <6d941f120809021125u7188d67l2612fd0f09277abc@mail.gmail.com> Message-ID: Despite Tiago's self-criticism I thought his BOSC presentation was fine and up to par with the rest of them. jared On Sep 2, 2008, at 1:25 PM, Tiago Ant?o wrote: > Hi All, > > On Tue, Sep 2, 2008 at 4:49 PM, Peter > wrote: > >> (As an aside, once we support numpy, having scipy as an optional >> dependency for the statistics Tiago wanted to use in Bio.PopGen does >> seem less onerous.) >> > > First, my apologies for not reporting back from BOSC,but I was in a > conference/professional visit spree for the last 3 months, returned > last > most. Basically it was not OK: I arrived there from a previous > conference > and did the presentation without little sleep, it was probably the > sloppiest > presentation in my whole life. My sincere apologies. > > On a better front, I have a lot of new content for Bio.PopGen, a few > remarks: > 1. No documentation and testing done, so I will skip adding content > to 1.48. > But I will surely add to 1.49. > 2. None of the new content relies on scipy (as there was no > agreement on > that), but being able to use scipy would make things much easier. > Most of > anything that can be called "population genetics" is nothing more than > statistics (statistics were invented because of population > genetics). So a > change in policy would be welcomed (and would make Bio.PopGen really > useful > for a wide audience - currently it has only niche users). > > > In another front, we published a paper using content from Bio.PopGen > 1.44 > http://www.biomedcentral.com/1471-2105/9/323 > > Regards, > Tiago > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From tiagoantao at gmail.com Tue Sep 2 15:29:55 2008 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Tue, 2 Sep 2008 20:29:55 +0100 Subject: [Biopython-dev] Preparing for Biopython 1.48 In-Reply-To: <320fb6e00809021205v4ec1a8f3wa3997881ad1e7d07@mail.gmail.com> References: <320fb6e00809020849i7c480997g810ffbc523143f47@mail.gmail.com> <6d941f120809021125u7188d67l2612fd0f09277abc@mail.gmail.com> <320fb6e00809021205v4ec1a8f3wa3997881ad1e7d07@mail.gmail.com> Message-ID: <6d941f120809021229u389d1550re8bb7ec4ad3fc5b@mail.gmail.com> On Tue, Sep 2, 2008 at 8:05 PM, Peter wrote: > This reminds me that I could/should make a PDF version of the BOSC > 2008 slides to go online here: > http://biopython.org/wiki/Documentation#Presentations > http://www.slideshare.net/tiago/bosc-2008-biopython Is there for a month, by I completely forgot to inform. Tiago From bugzilla-daemon at portal.open-bio.org Wed Sep 3 12:46:30 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 3 Sep 2008 12:46:30 -0400 Subject: [Biopython-dev] [Bug 2578] New: The GenBank SeqRecord parser does not record module type or if circular Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2578 Summary: The GenBank SeqRecord parser does not record module type or if circular Product: Biopython Version: 1.47 Platform: All OS/Version: All Status: NEW Severity: minor Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk Filing this bug after discussion on the mailing list, where the issue was raised by Chris Lasher: http://lists.open-bio.org/pipermail/biopython/2008-September/004474.html http://lists.open-bio.org/pipermail/biopython/2008-September/004475.html http://lists.open-bio.org/pipermail/biopython/2008-September/004476.html The LOCUS line at the start of a GenBank record can record the molecule type (DNA, RNA, mRNA, protein etc) and also if the sequence is linear or circular, e.g. LOCUS NC_002678 7036071 bp DNA circular BCT 22-JUL-2008 Currently Bio.SeqIO (and Bio.GenBank.FeatureParser if called directly) do not record these two bits of information in the SeqRecord. Bio.SeqIO uses the Bio.GenBank.FeatureParser, which gets passed this information from the Scanner via the residue_type event. This is a combined lump of data containing both the sequence type (DNA, RNA etc) and if it is linear or circular. It is currently only used to determine the Seq alphabet, and has never been recorded. So in addition to not recording if the LOCUS line said the sequence was circular, if the LOCUS line contained cDNA, mRNA, ... this fine detail is also currently lost in the SeqRecord representation. On the other hand, the Bio.GenBank.RecordParser stores all this as the record's residue_type property (a single combined field, presumably reflecting the layout of early GenBank files). It would be a logical improvement to record the sequence data (molecule type and if circular) in the SeqRecord's annotations dictionary - perhaps as two fields but we'd need to check if that would be straight forward for EMBL files too. Alternatively, if Biopython included a native CircularSeq object, we could use that explicitly when the sequence is declared as circular. This might be considered a little surprising though. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Sep 3 12:54:35 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 3 Sep 2008 12:54:35 -0400 Subject: [Biopython-dev] [Bug 2578] The GenBank SeqRecord parser does not record module type or if circular In-Reply-To: Message-ID: <200809031654.m83GsZ5G017770@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2578 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-09-03 12:54 EST ------- Note that after any change made to record this information, the preliminary GenBank writing support for Bio.SeqIO should also be updated - see Bug 2294. It would also be sensible to see how BioPerl, BioJava etc store this information within BioSQL so that if possible we can do it the same way. I'm assuming this is just a case of picking the same text (term table key) for our annotations dictionary key. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Sep 3 12:55:38 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 3 Sep 2008 12:55:38 -0400 Subject: [Biopython-dev] [Bug 2578] The GenBank SeqRecord parser does not record molecule type or if circular In-Reply-To: Message-ID: <200809031655.m83GtcPl017915@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2578 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|The GenBank SeqRecord parser|The GenBank SeqRecord parser |does not record module type |does not record molecule |or if circular |type or if circular ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-09-03 12:55 EST ------- Fixed the typo in "molecule" for the bug title. Whoops. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Thu Sep 4 14:36:44 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 4 Sep 2008 19:36:44 +0100 Subject: [Biopython-dev] Preparing for Biopython 1.48 In-Reply-To: <6d941f120809021229u389d1550re8bb7ec4ad3fc5b@mail.gmail.com> References: <320fb6e00809020849i7c480997g810ffbc523143f47@mail.gmail.com> <6d941f120809021125u7188d67l2612fd0f09277abc@mail.gmail.com> <320fb6e00809021205v4ec1a8f3wa3997881ad1e7d07@mail.gmail.com> <6d941f120809021229u389d1550re8bb7ec4ad3fc5b@mail.gmail.com> Message-ID: <320fb6e00809041136q4c01360ane4b0efcc58a68b8a@mail.gmail.com> On Tue, Sep 2, 2008 at 8:29 PM, Tiago Ant?o wrote: > On Tue, Sep 2, 2008 at 8:05 PM, Peter wrote: > >> This reminds me that I could/should make a PDF version of the BOSC >> 2008 slides to go online here: >> http://biopython.org/wiki/Documentation#Presentations >> > > http://www.slideshare.net/tiago/bosc-2008-biopython > Is there for a month, by I completely forgot to inform. I spotted the slide share thing (it surprised me last year) and added that to the wiki page already. On to practical matters, I've just done a clean check out on Linux and run the test suite, everything passes except these ones with external dependencies: test_GFF ... skipping. Environment is not configured for this test (not important if you do not plan to use Bio.GFF). test_PopGen_FDist ... skipping. Fdist not found (not a problem if you do not intend to use it). test_PopGen_SimCoal ... skipping. SimCoal not found (not a problem if you do not intend to use it). test_Wise ... skipping. sh: dnal: command not found test_psw ... skipping. sh: dnal: command not found I've never installed any of the PopGen tools - I presume these tests are still OK on your machine(s) Tiago? I've also never installed dnal, or setup whatever test_GFF wants. Could anyone confirm these are OK? I have run the BioSQL tests though. Peter From tiagoantao at gmail.com Fri Sep 5 06:05:26 2008 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Fri, 5 Sep 2008 11:05:26 +0100 Subject: [Biopython-dev] Preparing for Biopython 1.48 In-Reply-To: <320fb6e00809041136q4c01360ane4b0efcc58a68b8a@mail.gmail.com> References: <320fb6e00809020849i7c480997g810ffbc523143f47@mail.gmail.com> <6d941f120809021125u7188d67l2612fd0f09277abc@mail.gmail.com> <320fb6e00809021205v4ec1a8f3wa3997881ad1e7d07@mail.gmail.com> <6d941f120809021229u389d1550re8bb7ec4ad3fc5b@mail.gmail.com> <320fb6e00809041136q4c01360ane4b0efcc58a68b8a@mail.gmail.com> Message-ID: <6d941f120809050305v7f991fe0jcfdfc650936e7348@mail.gmail.com> On Thu, Sep 4, 2008 at 7:36 PM, Peter wrote: > I've never installed any of the PopGen tools - I presume these tests > are still OK on your machine(s) Tiago? > > All tests are OK here (Linux x86). From biopython at maubp.freeserve.co.uk Fri Sep 5 06:43:27 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 5 Sep 2008 11:43:27 +0100 Subject: [Biopython-dev] CVS freeze for Biopython 1.48 Message-ID: <320fb6e00809050343y598c436fi9aa65ec272f1492d@mail.gmail.com> Dear all, I'm going to try and put together Biopython 1.48 this afternoon, so could you all not commit any changes until further notice please. I'll be doing the source code releases (and possibly a Windows installer for Python 2.3 tonight if my old laptop still has all the MS compilers working), but there will then be a slight delay while we get the (other) Windows installers done. Thank you, Peter From biopython at maubp.freeserve.co.uk Fri Sep 5 19:19:13 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 6 Sep 2008 00:19:13 +0100 Subject: [Biopython-dev] Nexus issue on Windows with Python 2.3 Message-ID: <320fb6e00809051619v762ee276ld0d44a300403bfeb@mail.gmail.com> Can anyone using Windows and/or Python 2.3 try running the test suite? I'm seeing a problem with test_SeqIO.py and test_AlignIO.py when they call Bio.Nexus to construct a particular Nexus object (using seqences originally read in from Tests/Nexus/test_Nexus_input.nex for what its worth). This triggers: TypeError: zip() requires at least one sequence On Linux with Python 2.4, and Mac OS X with Python 2.5 these two tests both passed for me. Peter From biopython at maubp.freeserve.co.uk Sat Sep 6 05:06:20 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 6 Sep 2008 10:06:20 +0100 Subject: [Biopython-dev] Nexus issue on Windows with Python 2.3 In-Reply-To: <320fb6e00809051619v762ee276ld0d44a300403bfeb@mail.gmail.com> References: <320fb6e00809051619v762ee276ld0d44a300403bfeb@mail.gmail.com> Message-ID: <320fb6e00809060206q88a44cblb1188380ae921dde@mail.gmail.com> On Sat, Sep 6, 2008 at 12:19 AM, Peter wrote: > Can anyone using Windows and/or Python 2.3 try running the test suite? > > I'm seeing a problem with test_SeqIO.py and test_AlignIO.py when they > call Bio.Nexus to construct a particular Nexus object (using seqences > originally read in from Tests/Nexus/test_Nexus_input.nex for what its > worth). This triggers: > > TypeError: zip() requires at least one sequence > > On Linux with Python 2.4, and Mac OS X with Python 2.5 these two tests > both passed for me. This was a python 2.3 problem, the Nexus code (Bio/Nexus/Nexus.py line 1633) was using a Python 2.4+ only feature, see http://docs.python.org/lib/built-in-funcs.html sitesm=zip(*[self.matrix[t].tostring() for t in self.taxlabels]) I've added a check in CVS for an empty list of taxlabels (with a comment about python 2.3). Peter From biopython at maubp.freeserve.co.uk Sat Sep 6 06:04:08 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 6 Sep 2008 11:04:08 +0100 Subject: [Biopython-dev] New line issues in the source zip or tarballs Message-ID: <320fb6e00809060304h429f1085r301170aa93d4eb73@mail.gmail.com> I've run into a little issue on Windows while preparing Biopython 1.48, If you check out the latest code from CVS on Windows, then assuming the CVS client is setup correctly, all the python files and plain text input files get DOS/Windows newlines. Running the test suite looks OK (there are a few little known issues l've previously mentioned). However, having built a release on Linux, it seems both the tarball and the zip file contain the text files using Unix newlines. I can build Biopython on Windows from the zip file (Unix newlines are not a problem for running the python code), but it does break a few of the unit tests (test_SCOP_Cla.py and test_SCOP_Raf.py and test_PopGen_SimCoal_nodepend.py). This is only an issue for the minority of Windows users who will actually run the test suite. Most will just use the click-and-run installers which don't include the tests, and I expect anyone trying to build Biopython on Windows will probably use CVS. So we could just ignore this for the time being... One solution would be to try and tweak the source code distributions so the tarball uses linux line endings, while the zip file uses DOS/Windows. This does seem nasty. Or, we can try and tweak the failing unit tests to cope with their input files in either format. In the case of test_PopGen_SimCoal_nodepend.py the failure is expecting simple.par and simple_100_30.par to be exactly the same size (in class TemplateTest, line 47). This is not true going to be true when the input file uses Unix new lines but the generated file uses Windows new lines. Perhaps using a simple bit of code to load the files line by line and compare them would work here? Peter From fkauff at biologie.uni-kl.de Mon Sep 8 03:34:32 2008 From: fkauff at biologie.uni-kl.de (Frank Kauff) Date: Mon, 08 Sep 2008 09:34:32 +0200 Subject: [Biopython-dev] Nexus issue on Windows with Python 2.3 In-Reply-To: <320fb6e00809060206q88a44cblb1188380ae921dde@mail.gmail.com> References: <320fb6e00809051619v762ee276ld0d44a300403bfeb@mail.gmail.com> <320fb6e00809060206q88a44cblb1188380ae921dde@mail.gmail.com> Message-ID: <48C4D588.701@biologie.uni-kl.de> Peter wrote: > On Sat, Sep 6, 2008 at 12:19 AM, Peter wrote: > >> Can anyone using Windows and/or Python 2.3 try running the test suite? >> >> I'm seeing a problem with test_SeqIO.py and test_AlignIO.py when they >> call Bio.Nexus to construct a particular Nexus object (using seqences >> originally read in from Tests/Nexus/test_Nexus_input.nex for what its >> worth). This triggers: >> >> TypeError: zip() requires at least one sequence >> >> On Linux with Python 2.4, and Mac OS X with Python 2.5 these two tests >> both passed for me. >> > > This was a python 2.3 problem, the Nexus code (Bio/Nexus/Nexus.py line > 1633) was using a Python 2.4+ only feature, see > http://docs.python.org/lib/built-in-funcs.html > > sitesm=zip(*[self.matrix[t].tostring() for t in self.taxlabels]) > > I've added a check in CVS for an empty list of taxlabels (with a > comment about python 2.3). > > Good catch. Thanks Peter for fixing it. > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > -- J-Prof. Dr. Frank Kauff Molecular Phylogenetics FB Biologie, 13/276 TU Kaiserslautern Postfach 3049 67653 Kaiserslautern Tel. +49 (0)631 205-2562 Fax. +49 (0)631 205-2998 email: fkauff at biologie.uni-kl.de skype: frank.kauff From p.j.a.cock at googlemail.com Mon Sep 8 05:41:33 2008 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 8 Sep 2008 10:41:33 +0100 Subject: [Biopython-dev] test_SVDSuperimposer.py Message-ID: <320fb6e00809080241x3db79410lf54dd1612e5e04cc@mail.gmail.com> Hi all, I've noticed test_SVDSuperimposer.py seems to stall/run for ever on one of the Linux machines I have run it one. However, on my main Linux machine it is fine, and on Mac OS X. Has anyone else noticed this? Maybe there is some common thread (e.g. version of Numeric or something). Peter From p.j.a.cock at googlemail.com Mon Sep 8 06:43:17 2008 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 8 Sep 2008 11:43:17 +0100 Subject: [Biopython-dev] test_SVDSuperimposer.py In-Reply-To: <364233.84511.qm@web62401.mail.re1.yahoo.com> References: <320fb6e00809080241x3db79410lf54dd1612e5e04cc@mail.gmail.com> <364233.84511.qm@web62401.mail.re1.yahoo.com> Message-ID: <320fb6e00809080343t78e068een8e50d0237d9852c8@mail.gmail.com> On Mon, Sep 8, 2008 at 11:36 AM, Michiel de Hoon wrote: > When installing Numerical Python, run > > python setup.py config > > before build, install. > (assuming you are using Numerical Python version 24.2). > > --Michiel. I've checked and it is version 24.2 that is installed on the machine in question. I'm not sure if this was installed from source or via the yum package manager, but Numeric seems to work. $ python Python 2.5 (r25:51908, Nov 23 2006, 18:40:28) [GCC 4.1.1 20061011 (Red Hat 4.1.1-30)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import Numeric >>> Numeric.__version__ '24.2' I imagine if there was something seriously wrong with this Numeric installation it would have shown up in other unit tests. So it looks like the version of Numeric isn't the issue. Any other ideas? I take it you've never had a problem with test_SVDSuperimposer.py getting stuck? Thanks, Peter From mjldehoon at yahoo.com Mon Sep 8 06:36:44 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Mon, 8 Sep 2008 03:36:44 -0700 (PDT) Subject: [Biopython-dev] test_SVDSuperimposer.py In-Reply-To: <320fb6e00809080241x3db79410lf54dd1612e5e04cc@mail.gmail.com> Message-ID: <364233.84511.qm@web62401.mail.re1.yahoo.com> When installing Numerical Python, run python setup.py config before build, install. (assuming you are using Numerical Python version 24.2). --Michiel. --- On Mon, 9/8/08, Peter Cock wrote: > From: Peter Cock > Subject: [Biopython-dev] test_SVDSuperimposer.py > To: "BioPython-Dev Mailing List" > Date: Monday, September 8, 2008, 5:41 AM > Hi all, > > I've noticed test_SVDSuperimposer.py seems to stall/run > for ever on > one of the Linux machines I have run it one. However, on > my main > Linux machine it is fine, and on Mac OS X. Has anyone else > noticed > this? Maybe there is some common thread (e.g. version of > Numeric or > something). > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From biopython at maubp.freeserve.co.uk Mon Sep 8 07:20:57 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 8 Sep 2008 12:20:57 +0100 Subject: [Biopython-dev] CVS freeze for Biopython 1.48 In-Reply-To: <320fb6e00809050343y598c436fi9aa65ec272f1492d@mail.gmail.com> References: <320fb6e00809050343y598c436fi9aa65ec272f1492d@mail.gmail.com> Message-ID: <320fb6e00809080420y5f456ab4uc4ec42d845c25f1a@mail.gmail.com> On Fri, Sep 5, 2008 at 11:43 AM, Peter wrote: > Dear all, > > I'm going to try and put together Biopython 1.48 this afternoon, so > could you all not commit any changes until further notice please. This took longer than planned. I have now tagged CVS and uploaded the tar-ball and zip file for Biopython 1.48 to http://biopython.org/DIST/ as usual. Before we make the public announcement (email, news server, and wiki pages), having one or two people try downloading these, installing from source and running the unit tests would be great. Little things (like documentation improvements!) can go into CVS now, but could you all refrain from any major changes (like Numeric/numpy, additional deprecations or removals) until the release has been public for a few days without issue? Just in case we have to tweak things, this would make dealing with CVS easier. Thanks. > I'll be doing the source code releases (and possibly a Windows > installer for Python 2.3 tonight if my old laptop still has all the MS > compilers working), but there will then be a slight delay while we get > the (other) Windows installers done. I may be able to do the Python 2.3 Windows installer tonight - we'll see. For future reference, hevea 1.08 which my linux box had installed doesn't work nicely on the tutorial (the title page information goes missing), but hevea 1.10 is fine. http://biopython.org/DIST/docs/tutorial/Tutorial.pdf http://biopython.org/DIST/docs/tutorial/Tutorial.html Also, we're currently using epydoc version 3.0.1 for the API documentation: http://biopython.org/DIST/docs/api/ I did check in a few more module level docstrings so this does look a bit more complete than in Biopython 1.47. There is still room for improvement, for example Bio.SeqUtils needs some love. Also many of the deprecated modules don't say they are deprecated in the module level docstring which I think is good thing to do. Any views on this? Thanks, Peter From tiagoantao at gmail.com Mon Sep 8 07:42:51 2008 From: tiagoantao at gmail.com (=?ISO-8859-1?Q?Tiago_Ant=E3o?=) Date: Mon, 8 Sep 2008 12:42:51 +0100 Subject: [Biopython-dev] New line issues in the source zip or tarballs In-Reply-To: <320fb6e00809060304h429f1085r301170aa93d4eb73@mail.gmail.com> References: <320fb6e00809060304h429f1085r301170aa93d4eb73@mail.gmail.com> Message-ID: <6d941f120809080442r1797666eu70e35c60353c5462@mail.gmail.com> Hi, On Sat, Sep 6, 2008 at 11:04 AM, Peter wrote: > In the case of test_PopGen_SimCoal_nodepend.py the failure is > expecting simple.par and simple_100_30.par to be exactly the same size > (in class TemplateTest, line 47). This is not true going to be true > when the input file uses Unix new lines but the generated file uses > Windows new lines. Perhaps using a simple bit of code to load the > files line by line and compare them would work here? > I am currently at a workshop (I belong to the organization committee, so I don't have much time), but I will try to sort this in the next couple of days. Tiago From biopython at maubp.freeserve.co.uk Mon Sep 8 08:14:09 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 8 Sep 2008 13:14:09 +0100 Subject: [Biopython-dev] New line issues in the source zip or tarballs In-Reply-To: <6d941f120809080442r1797666eu70e35c60353c5462@mail.gmail.com> References: <320fb6e00809060304h429f1085r301170aa93d4eb73@mail.gmail.com> <6d941f120809080442r1797666eu70e35c60353c5462@mail.gmail.com> Message-ID: <320fb6e00809080514u5df6d9dej144c783076cbe467@mail.gmail.com> Tiago wrote: > Peter wrote: >> In the case of test_PopGen_SimCoal_nodepend.py the failure is >> expecting simple.par and simple_100_30.par to be exactly the same size >> (in class TemplateTest, line 47). This is not true going to be true >> when the input file uses Unix new lines but the generated file uses >> Windows new lines. Perhaps using a simple bit of code to load the >> files line by line and compare them would work here? > > I am currently at a workshop (I belong to the organization committee, so I > don't have much time), but I will try to sort this in the next couple of > days. Hi Tiago, This issue new line issue has probably been there since Biopython 1.45 without anyone else spotting it, so I don't see fixing it as urgent. Hopefully we can resolve this for the next release instead. I hope your workshop goes well, Peter From mjldehoon at yahoo.com Mon Sep 8 08:11:56 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Mon, 8 Sep 2008 05:11:56 -0700 (PDT) Subject: [Biopython-dev] test_SVDSuperimposer.py In-Reply-To: <320fb6e00809080343t78e068een8e50d0237d9852c8@mail.gmail.com> Message-ID: <339043.58717.qm@web62407.mail.re1.yahoo.com> Try if the eigenvalues function in Numerical Python works. If it hangs, you'll know the problem is in Numerical Python. --Michiel --- On Mon, 9/8/08, Peter Cock wrote: > From: Peter Cock > Subject: Re: [Biopython-dev] test_SVDSuperimposer.py > To: mjldehoon at yahoo.com > Cc: "BioPython-Dev Mailing List" > Date: Monday, September 8, 2008, 6:43 AM > On Mon, Sep 8, 2008 at 11:36 AM, Michiel de Hoon > wrote: > > When installing Numerical Python, run > > > > python setup.py config > > > > before build, install. > > (assuming you are using Numerical Python version > 24.2). > > > > --Michiel. > > I've checked and it is version 24.2 that is installed > on the machine > in question. I'm not sure if this was installed from > source or via > the yum package manager, but Numeric seems to work. > > $ python > Python 2.5 (r25:51908, Nov 23 2006, 18:40:28) > [GCC 4.1.1 20061011 (Red Hat 4.1.1-30)] on linux2 > Type "help", "copyright", > "credits" or "license" for more > information. > >>> import Numeric > >>> Numeric.__version__ > '24.2' > > I imagine if there was something seriously wrong with this > Numeric > installation it would have shown up in other unit tests. > So it looks > like the version of Numeric isn't the issue. Any other > ideas? > > I take it you've never had a problem with > test_SVDSuperimposer.py getting stuck? > > Thanks, > > Peter From p.j.a.cock at googlemail.com Mon Sep 8 08:24:01 2008 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 8 Sep 2008 13:24:01 +0100 Subject: [Biopython-dev] test_SVDSuperimposer.py In-Reply-To: <339043.58717.qm@web62407.mail.re1.yahoo.com> References: <320fb6e00809080343t78e068een8e50d0237d9852c8@mail.gmail.com> <339043.58717.qm@web62407.mail.re1.yahoo.com> Message-ID: <320fb6e00809080524i1f75c601p2a7191b6207bd2e@mail.gmail.com> On Mon, Sep 8, 2008 at 1:11 PM, Michiel de Hoon wrote: > Try if the eigenvalues function in Numerical Python works. If it hangs, you'll know the problem is in Numerical Python. Good thinking - it does indeed hang on the machine in question, $ python Python 2.5 (r25:51908, Nov 23 2006, 18:40:28) [GCC 4.1.1 20061011 (Red Hat 4.1.1-30)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import Numeric, LinearAlgebra >>> Numeric.__version__ '24.2' >>> data = Numeric.array([[1,2,3],[4,5,6],[7,8,9]]) >>> LinearAlgebra.eigenvalues(data) [hangs here] This works fine on another Linux box, $ python Python 2.4.3 (#1, Jun 27 2006, 16:32:39) [GCC 3.4.5 20051201 (Red Hat 3.4.5-2)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import Numeric, LinearAlgebra >>> Numeric.__version__ '24.2' >>> data = Numeric.array([[1,2,3],[4,5,6],[7,8,9]]) >>> LinearAlgebra.eigenvalues(data) array([ 1.61168440e+01, -1.11684397e+00, -1.30367773e-15]) And this example also works on the Mac: $ python Python 2.5.2 (r252:60911, Feb 22 2008, 07:57:53) [GCC 4.0.1 (Apple Computer, Inc. build 5363)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import Numeric, LinearAlgebra >>> Numeric.__version__ '24.2' >>> data = Numeric.array([[1,2,3],[4,5,6],[7,8,9]]) >>> LinearAlgebra.eigenvalues(data) array([ 1.61168440e+01, -1.11684397e+00, -1.30367773e-15]) So we can probably rule out a problem with Biopython in test_SVDSuperimposer.py which is good, but I should probably try and work out what is wrong with Numeric on this particular machine... Thanks for your advice, Peter From mjldehoon at yahoo.com Mon Sep 8 08:56:37 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Mon, 8 Sep 2008 05:56:37 -0700 (PDT) Subject: [Biopython-dev] test_SVDSuperimposer.py In-Reply-To: <320fb6e00809080524i1f75c601p2a7191b6207bd2e@mail.gmail.com> Message-ID: <622098.12504.qm@web62404.mail.re1.yahoo.com> > So we can probably rule out a problem with Biopython in > test_SVDSuperimposer.py which is good, but I should > probably try and > work out what is wrong with Numeric on this particular > machine... > Have a look at this: http://projects.scipy.org/pipermail/numpy-discussion/2004-January/015074.html --Michiel. From quwubin at gmail.com Mon Sep 8 09:59:57 2008 From: quwubin at gmail.com (Wubin Qu) Date: Mon, 8 Sep 2008 21:59:57 +0800 Subject: [Biopython-dev] BioPythonGUI: Graphical User Interface for BioPython Message-ID: Hi all, I started a new project named BioPythonGUI for a few of days. The following is the 'About' page from BioPythonGUI project. BioPythonGUI is a Graphical User Interface of BioPython. BioPython is a widely used python module set in bioinformatics. It help researchers: - Parsing files in di fferent database formats - Interfaces into programs like Blast, Entrez and PubMed - A sequence class (can transcribe, translate, invert, etc) - Code for handling alignments of sequences - Clustering algorithms - etc. However, it's not everyone can use the BioPython, especially ones who do not know much about the programming. How can you expect a professor who never known about any programming to use BioPython to parse the BLAST report file? This is the problem which the BioPythonGUI would solve. I started the project with the goal "Everyone can use BioPython with BioPythonGUI". Until now, there are two modules SeqGUI and BlastGUI are available in BioPythonGUI. I would greatly appreciate if you use BioPythonGUI and send me the feedback. Please see the developer's blog for details. Project Blog: http://biopythongui.blogspot.com/ Download: https://sites.google.com/site/biopythongui/download Screenshots: http://picasaweb.google.com/quwubin/BioPythonGUI02# ______________________________ Best regards, Wubin Qu From p.j.a.cock at googlemail.com Mon Sep 8 10:12:15 2008 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 8 Sep 2008 15:12:15 +0100 Subject: [Biopython-dev] [BioPython] BioPythonGUI: Graphical User Interface for BioPython In-Reply-To: References: Message-ID: <320fb6e00809080712v6c33d42fheb982f52e62e6e95@mail.gmail.com> On Mon, Sep 8, 2008 at 2:43 PM, Wubin Qu wrote: > Hi all, > > I started a new project named BioPythonGUI for a few of days. Hello Wubin Qu, > BioPythonGUI is a Graphical User Interface of BioPython. I'm uncomfortable about the name BioPythonGUI, as this to me implies it is part of Biopython (whereas is it currently just a third party project built on top of Biopython). What do other people think? > However, it's not everyone can use the BioPython, especially ones who do not > know much about the programming. How can you expect a professor who never > known about any programming to use BioPython to parse the BLAST report file? > This is the problem which the BioPythonGUI would solve. I started the > project with the goal "Everyone can use BioPython with BioPythonGUI". I don't really understand your goal. How would a non-programming professor use your program to parse a BLAST report file? The NCBI already try and make the HTML and plain text output useful to non-programmers and from looking at the screenshots I don't see how your tool would help. > Until now, there are two modules SeqGUI and BlastGUI are available in > BioPythonGUI. I would greatly appreciate if you use BioPythonGUI and send me > the feedback. I see your module SeqGUI builds on the SeqGui.py in BioPython (in the scripts directory). It might make sense to include your improvements to this code as part of Biopython. I haven't looked at your code yet, so I don't know how much you've changed things. It is nice to be able to be able to translate, transcribe, reverse complement etc in a GUI, but personally I don't see the point or writing a little application just for this. Also, there are probably many many existing tools out there that already offer this functionality. However, I am happy writing code, so I am not in your target audience. Regarding your BlastGUI idea, I can see that a GUI for standalone blast is nicer than the command line for some people. However, I don't see how this is more useful than running a local blast web server (something the NCBI already provides). Sorry for being so negative, Peter From quwubin at gmail.com Mon Sep 8 10:38:27 2008 From: quwubin at gmail.com (Wubin Qu) Date: Mon, 8 Sep 2008 22:38:27 +0800 Subject: [Biopython-dev] [BioPython] BioPythonGUI: Graphical User Interface for BioPython In-Reply-To: <320fb6e00809080712v6c33d42fheb982f52e62e6e95@mail.gmail.com> References: <320fb6e00809080712v6c33d42fheb982f52e62e6e95@mail.gmail.com> Message-ID: Hi Peter, Thans for your reply. My goal is simple: Programs with GUI are easily to use. BioPython with GUI will facilitate people. The next module is: BlastParserGUI. I think it will be useful. Yes, SeqGUI is built on SeqGui.py. And I learn a lot from SeqGui.py. It inspires me to build other modules. I mentioned this here . ______________________________ Best regards, Wubin Qu 2008/9/8 Peter Cock > On Mon, Sep 8, 2008 at 2:43 PM, Wubin Qu wrote: > > Hi all, > > > > I started a new project named BioPythonGUI for a few of days. > > Hello Wubin Qu, > > > BioPythonGUI is a Graphical User Interface of BioPython. > > I'm uncomfortable about the name BioPythonGUI, as this to me implies > it is part of Biopython (whereas is it currently just a third party > project built on top of Biopython). What do other people think? > > > However, it's not everyone can use the BioPython, especially ones who do > not > > know much about the programming. How can you expect a professor who never > > known about any programming to use BioPython to parse the BLAST report > file? > > This is the problem which the BioPythonGUI would solve. I started the > > project with the goal "Everyone can use BioPython with BioPythonGUI". > > I don't really understand your goal. How would a non-programming > professor use your program to parse a BLAST report file? The NCBI > already try and make the HTML and plain text output useful to > non-programmers and from looking at the screenshots I don't see how > your tool would help. > > > Until now, there are two modules SeqGUI and BlastGUI are available in > > BioPythonGUI. I would greatly appreciate if you use BioPythonGUI and send > me > > the feedback. > > I see your module SeqGUI builds on the SeqGui.py in BioPython (in the > scripts directory). It might make sense to include your improvements > to this code as part of Biopython. I haven't looked at your code yet, > so I don't know how much you've changed things. > > It is nice to be able to be able to translate, transcribe, reverse > complement etc in a GUI, but personally I don't see the point or > writing a little application just for this. Also, there are probably > many many existing tools out there that already offer this > functionality. However, I am happy writing code, so I am not in your > target audience. > > Regarding your BlastGUI idea, I can see that a GUI for standalone > blast is nicer than the command line for some people. However, I > don't see how this is more useful than running a local blast web > server (something the NCBI already provides). > > Sorry for being so negative, > > Peter > From biopython at maubp.freeserve.co.uk Tue Sep 9 06:14:11 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 9 Sep 2008 11:14:11 +0100 Subject: [Biopython-dev] Biopython 1.48 released Message-ID: <320fb6e00809090314s722f404bqda71d7d9f97360e7@mail.gmail.com> We are pleased to announce the release of Biopython 1.48. Some new functionality has been added, a few bugs have been fixed, the documentation has been updated, plus several obsolete modules have been deprecated (or explicitly labelled as obsolete). The following additional file formats are now supported in Bio.SeqIO and Bio.AlignIO: * reading and writing "tab" format (simple tab separated) * writing "nexus" files * reading "pir" files (NBRF/PIR) * basic support for writing "genbank" files (GenBank plain text) This release also fixes some problems reading Clustal alignments (introduced in Biopython 1.46 when consolidating Bio.AlignIO and Bio.Clustalw), and some updates to the Bio.Sequencing parsers. The SeqRecord and Alignment objects have a new method to get the object as a string in a given file format (handled via Bio.SeqIO and Bio.AlignIO). Bio.PubMed and the online code in Bio.GenBank are now considered obsolete, and we intend to deprecate them after the next release. For accessing PubMed and GenBank, please use Bio.Entrez instead. Martel and Bio.Mindy are now considered to be obsolete, and are likely to be deprecated and removed in a future release, at which point we will drop the optional dependency on mxTextTools. Bio.Fasta is also considered to be obsolete, please use Bio.SeqIO instead. We do intend to deprecate this module eventually, however, for several years this was the primary FASTA parsing module in Biopython and is likely to be in use in many existing scripts. In addition a number of other modules have been deprecated, including: Bio.MetaTool, Bio.EUtils, Bio.Saf, Bio.NBRF, and Bio.IntelliGenetics - see the DEPRECATED file for full details. Source distributions are available from the Biopython website at http://biopython.org, and Windows installers will be added shortly. My thanks to all bug reporters, code contributors and others who made this new release possible. Peter, on behalf of the Biopython developers P.S. This message will be forwarded to the Biopython anoucement mailing list shortly. For those of you who prefer news readers to email lists, have a look at the OBF news server: http://news.open-bio.org/news/2008/09/biopython-release-148/ where there are Biopython news feeds available: http://news.open-bio.org/news/category/obf-projects/biopython/feed/rdf http://news.open-bio.org/news/category/obf-projects/biopython/feed/rss http://news.open-bio.org/news/category/obf-projects/biopython/feed/rss2 http://news.open-bio.org/news/category/obf-projects/biopython/feed/atom From mhampton at d.umn.edu Tue Sep 9 10:21:15 2008 From: mhampton at d.umn.edu (Marshall Hampton) Date: Tue, 9 Sep 2008 09:21:15 -0500 (CDT) Subject: [Biopython-dev] BioPythonGUI: Graphical User Interface for BioPython Message-ID: Hi, I think I'd mentioned this before on this list, but the BioPythonGUI post made me think I should again: people interested in GUIs and visualization with biopython should check out the Sage project: www.sagemath.org. I help maintain the inclusion of biopython in Sage as an optional package. Sage is a python-based computational platform that unites a great deal of mathematical software, and uses a web browser as its GUI. This makes sharing code very easy. I teach a bioinformatics course using Sage and biopython, which has been working very well. I wrote a brief introduction that gives some idea of what is possible with the Sage/biopython combination: http://openwetware.org/wiki/Open_writing_projects/Sage_and_cython_a_brief_introduction ...and the Sage wiki @interact examples might also give some ideas: http://wiki.sagemath.org/interact Cheers, Marshall Hampton University of Minnesota, Duluth PS. Sorry if this gets sent twice, I think I messed up the list address the first time. From mhampton at d.umn.edu Tue Sep 9 10:14:27 2008 From: mhampton at d.umn.edu (Marshall Hampton) Date: Tue, 9 Sep 2008 09:14:27 -0500 (CDT) Subject: [Biopython-dev] BioPythonGUI: Graphical User Interface for BioPython Message-ID: Hi, I think I'd mentioned this before on this list, but the BioPythonGUI post made me think I should again: people interested in GUIs and visualization with biopython should check out the Sage project: www.sagemath.org. I help maintain the inclusion of biopython in Sage as an optional package. Sage is a python-based computational platform that unites a great deal of mathematical software, and uses a web browser as its GUI. This makes sharing code very easy. I teach a bioinformatics course using Sage and biopython, which has been working very well. I wrote a brief introduction that gives some idea of what is possible with the Sage/biopython combination: http://openwetware.org/wiki/Open_writing_projects/Sage_and_cython_a_brief_introduction ...and the Sage wiki @interact examples might also give some ideas: http://wiki.sagemath.org/interact Cheers, Marshall Hampton University of Minnesota, Duluth From quwubin at gmail.com Tue Sep 9 20:28:55 2008 From: quwubin at gmail.com (Wubin Qu) Date: Wed, 10 Sep 2008 08:28:55 +0800 Subject: [Biopython-dev] BioPythonGUI: Graphical User Interface for BioPython In-Reply-To: References: Message-ID: Hi, Thank you. I am learning Sage now. I think that is another way of GUI and it's great. I'm sure I will learn a lot from Sage. ______________________________ Best regards, Wubin Qu 2008/9/9 Marshall Hampton > > Hi, > > I think I'd mentioned this before on this list, but the BioPythonGUI post > made me think I should again: people interested in GUIs and visualization > with biopython should check out the Sage project: www.sagemath.org. I > help maintain the inclusion of biopython in Sage as an optional package. > Sage is a python-based computational platform that unites a great deal of > mathematical software, and uses a web browser as its GUI. This makes > sharing code very easy. > > I teach a bioinformatics course using Sage and biopython, which has been > working very well. > > I wrote a brief introduction that gives some idea of what is possible with > the Sage/biopython combination: > > http://openwetware.org/wiki/Open_writing_projects/Sage_and_cython_a_brief_introduction > > ...and the Sage wiki @interact examples might also give some ideas: > http://wiki.sagemath.org/interact > > Cheers, > Marshall Hampton > University of Minnesota, Duluth > From bugzilla-daemon at portal.open-bio.org Wed Sep 10 05:03:43 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 10 Sep 2008 05:03:43 -0400 Subject: [Biopython-dev] [Bug 2583] New: small bug in NCBIXML.py Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2583 Summary: small bug in NCBIXML.py Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: minor Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: christen at unice.fr Hi When parsing an xml blast output b_record.sc_match and b_record.sc_mismatch are returned as none this is because the lines self._blast.sc_match=self._parameters.sc_match self._blast.sc_mismatch=self._parameters.sc_mismatch are missing in def _end_Iteration(self): This is a minor bug because it is very rare that a user wants these informations, as usually they know the parameters they used to run blast. Best regards Richard Christen, U of Nice, France -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mjldehoon at yahoo.com Wed Sep 10 10:28:03 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Wed, 10 Sep 2008 07:28:03 -0700 (PDT) Subject: [Biopython-dev] NumPy conversion roadmap Message-ID: <909028.58880.qm@web62404.mail.re1.yahoo.com> Hi everybody, Now that Biopython 1.48 is out (thanks Peter!), we can now start to consider the conversion from Numerical Python to NumPy. I'd like to propose the following steps: 1) Let's wait for a week or so before making any NumPy-related commits to see if any serious problems show up with the 1.48 release. 2) Three modules use Numerical Python at the C-level: Bio.Cluster, Bio.KDTree, and Bio.Affy. I have a NumPy-based module ready for Bio.Cluster. For Bio.KDTree and Bio.Affy, see my next mails. 3) Once these three modules are converted, Biopython can be compiled again. We can then consider the modules that use Numerical Python at the Python-level. There are about ten of those. Some of them are heavily used (such as Bio.PDB), whereas others are more obscure. Conversion is usually trivial, but I'd like to suggest that we take this opportunity also to review each of these modules to see if any should be deprecated. Comments, anybody? --Michiel. From mjldehoon at yahoo.com Wed Sep 10 10:28:53 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Wed, 10 Sep 2008 07:28:53 -0700 (PDT) Subject: [Biopython-dev] Bio.Affy Message-ID: <790264.35889.qm@web62401.mail.re1.yahoo.com> Hi everybody, The C++ code in Bio.Affy seems to be out of date; it is distributed with the Biopython releases but it is not actually used. There's a comment in setup.py saying that this C++ code was replaced by Python code. Does anybody know more about this? Can the C++ code in Bio.Affy be removed? --Michiel. From mjldehoon at yahoo.com Wed Sep 10 10:35:22 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Wed, 10 Sep 2008 07:35:22 -0700 (PDT) Subject: [Biopython-dev] Bio.KDTree Message-ID: <941377.41064.qm@web62401.mail.re1.yahoo.com> Hi everybody, I have a prototype version of Bio.KDTree for NumPy. This code differs from the current Bio.KDTree in that is uses C instead of C++. Thomas (or anybody else), any objections if I upload this version to CVS to replace the current Bio.KDTree? --Michiel. From chapmanb at 50mail.com Wed Sep 10 16:26:13 2008 From: chapmanb at 50mail.com (Brad Chapman) Date: Wed, 10 Sep 2008 16:26:13 -0400 Subject: [Biopython-dev] NumPy conversion roadmap Message-ID: <20080910202613.GR21009@localdomain> Hi all; Congrats on the 1.48 release. Great stuff. Michiel, I wanted to follow up on your NumPy conversion plans. I have those NumPy changes discussed on the main list earlier this month ready to check in, along with tests passing and documentation changes and all those good things. This does very basic conversions to NumPy using the compatibility modules. It sounds like a good path would be for me to check these changes in as a starting point and then you can go with your in depth changes from there. Hopefully, this will save you some time finding all the imports and that kind of fun. Any objections? If not, I can get these in right away and you can go from there. Brad -- Brad Chapman Codon Devices http://www.codondevices.com From bugzilla-daemon at portal.open-bio.org Wed Sep 10 20:29:42 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 10 Sep 2008 20:29:42 -0400 Subject: [Biopython-dev] [Bug 2585] New: Error in Bio.SeqUtils.apply_on_multi_fasta Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2585 Summary: Error in Bio.SeqUtils.apply_on_multi_fasta Product: Biopython Version: 1.48 Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: sbassi at gmail.com Function "apply_on_multi_fasta" (in SeqUtils) has properties that are no longer valid. See this line: arguments = [record.sequence] And this line: results.append('>%s\n%s' % (record.title, result)) This provokes an error when trying to run this function (sorry I don't have the error message in this computer). A possible replacement for both lines: arguments = [record.seq] and: results.append('>%s\n%s' % (record.name, result)) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Sep 11 04:01:52 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 11 Sep 2008 04:01:52 -0400 Subject: [Biopython-dev] [Bug 2586] New: New version of MeltingTemp.py Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2586 Summary: New version of MeltingTemp.py Product: Biopython Version: 1.48 Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: sbassi at gmail.com This version of MeltingTemp.py has a quick test and some reformatting to make it easier to read (code style changed) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Sep 11 04:03:54 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 11 Sep 2008 04:03:54 -0400 Subject: [Biopython-dev] [Bug 2586] New version of MeltingTemp.py In-Reply-To: Message-ID: <200809110803.m8B83sr2017438@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2586 ------- Comment #1 from sbassi at gmail.com 2008-09-11 04:03 EST ------- Created an attachment (id=994) --> (http://bugzilla.open-bio.org/attachment.cgi?id=994&action=view) New version of MeltingTemp.py -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mjldehoon at yahoo.com Thu Sep 11 07:06:45 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Thu, 11 Sep 2008 04:06:45 -0700 (PDT) Subject: [Biopython-dev] NumPy conversion roadmap In-Reply-To: <20080910202613.GR21009@localdomain> Message-ID: <828420.78524.qm@web62408.mail.re1.yahoo.com> --- On Wed, 9/10/08, Brad Chapman wrote: > I have those NumPy changes discussed on the main list earlier > this month ready to check in, along with tests passing and > documentation changes and all those good things. > Thanks! Those changes are for Bio.PDB, right? Bio.PDB being a heavily used module, your changes are very welcome. In a sense, Thomas has the last word on changes to Bio.PDB, since he wrote the module, but if there are no objections from Thomas then feel free to submit your changes to CVS. --Michiel. From chapmanb at 50mail.com Thu Sep 11 07:56:33 2008 From: chapmanb at 50mail.com (Brad Chapman) Date: Thu, 11 Sep 2008 07:56:33 -0400 Subject: [Biopython-dev] NumPy conversion roadmap In-Reply-To: <828420.78524.qm@web62408.mail.re1.yahoo.com> References: <20080910202613.GR21009@localdomain> <828420.78524.qm@web62408.mail.re1.yahoo.com> Message-ID: <20080911115633.GD6200@localdomain> Hi Michiel; > Thanks! > Those changes are for Bio.PDB, right? Bio.PDB being a heavily used > module, your changes are very welcome. In a sense, Thomas has the last > word on changes to Bio.PDB, since he wrote the module, but if there > are no objections from Thomas then feel free to submit your changes to > CVS. Yes, these handle PDB and all other Numeric modules. The changes are not to the code but rather to the imports so rather slight. We can move forward from here to a full port to NumPy if desired, but this should give the same functionality but allow people to have the up to date NumPy libraries. I checked everything in now so it should appear in CVS now. Let me know if there are any problems, and feel free to improve on these changes as y'all find best. Thanks, Brad -- Brad Chapman Codon Devices http://www.codondevices.com From mjldehoon at yahoo.com Sun Sep 14 09:23:53 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sun, 14 Sep 2008 06:23:53 -0700 (PDT) Subject: [Biopython-dev] NumPy conversion / Bio.KDTree / Bio.Cluster. Message-ID: <259643.39583.qm@web62401.mail.re1.yahoo.com> Hi everybody, I just committed a bunch of changes to Bio.Cluster, Bio.KDTree, and setup.py that deal with the old Numerical Python to new NumPy conversion. With these changes, Biopython should compile with NumPy; any remaining references to the old Numerical Python are at the Python-level only. Since these are rather big changes, please try with the current version of CVS to see if everything compiles cleanly and all tests pass. Comments, questions, suggestions are welcome. I also uploaded a plain C (instead of C++) version of Bio.KDTree, and adjusted setup.py accordingly. --Michiel. From chapmanb at 50mail.com Sun Sep 14 13:07:07 2008 From: chapmanb at 50mail.com (Brad Chapman) Date: Sun, 14 Sep 2008 13:07:07 -0400 Subject: [Biopython-dev] NumPy conversion / Bio.KDTree / Bio.Cluster. In-Reply-To: <259643.39583.qm@web62401.mail.re1.yahoo.com> References: <259643.39583.qm@web62401.mail.re1.yahoo.com> Message-ID: <1221412027.6552.1273897487@webmail.messagingengine.com> Hi Michiel; Great stuff. One quick note, it doesn't look like KDTreemodule.c got checked into CVS: gcc: Bio/KDTree/KDTreemodule.c: No such file or directory > ls -l Bio/KDTree/ total 64 -rw-rw-r-- 1 chapmanb chapmanb 2641 2007-04-23 05:45 CKDTree.py drwxrwxr-x 2 chapmanb chapmanb 4096 2008-09-14 12:39 CVS -rw-rw-r-- 1 chapmanb chapmanb 166 2007-04-23 05:45 HISTORY -rw-rw-r-- 1 chapmanb chapmanb 432 2007-04-23 05:45 __init__.py -rw-rw-r-- 1 chapmanb synbio 29504 2008-09-14 09:15 KDTree.c -rw-rw-r-- 1 chapmanb synbio 689 2008-09-14 12:39 KDTree.h -rw-rw-r-- 1 chapmanb synbio 8165 2008-09-14 12:39 KDTree.py -rw-rw-r-- 1 chapmanb synbio 151 2008-09-14 09:15 Neighbor.h Brad On Sun, 14 Sep 2008 06:23:53 -0700 (PDT), "Michiel de Hoon" said: > Hi everybody, > > I just committed a bunch of changes to Bio.Cluster, Bio.KDTree, and > setup.py that deal with the old Numerical Python to new NumPy conversion. > With these changes, Biopython should compile with NumPy; any remaining > references to the old Numerical Python are at the Python-level only. > Since these are rather big changes, please try with the current version > of CVS to see if everything compiles cleanly and all tests pass. > Comments, questions, suggestions are welcome. > > I also uploaded a plain C (instead of C++) version of Bio.KDTree, and > adjusted setup.py accordingly. > > --Michiel. > > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev -- Brad Chapman chapmanb at 50mail.com From mjldehoon at yahoo.com Sun Sep 14 13:10:16 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sun, 14 Sep 2008 10:10:16 -0700 (PDT) Subject: [Biopython-dev] NumPy conversion / Bio.KDTree / Bio.Cluster. In-Reply-To: <259643.39583.qm@web62401.mail.re1.yahoo.com> Message-ID: <673504.47838.qm@web62403.mail.re1.yahoo.com> Hi everybody, I just noticed that one file is missing in Biopython's CVS. I'll upload it as soon as possible but it may take a day or so. Sorry for the trouble. --Michiel --- On Sun, 9/14/08, Michiel de Hoon wrote: > From: Michiel de Hoon > Subject: [Biopython-dev] NumPy conversion / Bio.KDTree / Bio.Cluster. > To: biopython-dev at biopython.org > Date: Sunday, September 14, 2008, 9:23 AM > Hi everybody, > > I just committed a bunch of changes to Bio.Cluster, > Bio.KDTree, and setup.py that deal with the old Numerical > Python to new NumPy conversion. With these changes, > Biopython should compile with NumPy; any remaining > references to the old Numerical Python are at the > Python-level only. Since these are rather big changes, > please try with the current version of CVS to see if > everything compiles cleanly and all tests pass. Comments, > questions, suggestions are welcome. > > I also uploaded a plain C (instead of C++) version of > Bio.KDTree, and adjusted setup.py accordingly. > > --Michiel. > > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From mjldehoon at yahoo.com Tue Sep 16 07:12:03 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Tue, 16 Sep 2008 04:12:03 -0700 (PDT) Subject: [Biopython-dev] NumPy conversion / Bio.KDTree / Bio.Cluster. In-Reply-To: <1221412027.6552.1273897487@webmail.messagingengine.com> Message-ID: <901846.37002.qm@web62403.mail.re1.yahoo.com> Hi everybody, I now uploaded Bio/KDTree/KDTreemodule.c to CVS. Biopython should now compile with the new NumPy (and not any more with the old Numerical Python). --Michiel. --- On Sun, 9/14/08, Brad Chapman wrote: > From: Brad Chapman > Subject: Re: [Biopython-dev] NumPy conversion / Bio.KDTree / Bio.Cluster. > To: biopython-dev at biopython.org > Date: Sunday, September 14, 2008, 1:07 PM > Hi Michiel; > Great stuff. One quick note, it doesn't look like > KDTreemodule.c got > checked into CVS: > > gcc: Bio/KDTree/KDTreemodule.c: No such file or directory > > > ls -l Bio/KDTree/ > total 64 > -rw-rw-r-- 1 chapmanb chapmanb 2641 2007-04-23 05:45 > CKDTree.py > drwxrwxr-x 2 chapmanb chapmanb 4096 2008-09-14 12:39 CVS > -rw-rw-r-- 1 chapmanb chapmanb 166 2007-04-23 05:45 > HISTORY > -rw-rw-r-- 1 chapmanb chapmanb 432 2007-04-23 05:45 > __init__.py > -rw-rw-r-- 1 chapmanb synbio 29504 2008-09-14 09:15 > KDTree.c > -rw-rw-r-- 1 chapmanb synbio 689 2008-09-14 12:39 > KDTree.h > -rw-rw-r-- 1 chapmanb synbio 8165 2008-09-14 12:39 > KDTree.py > -rw-rw-r-- 1 chapmanb synbio 151 2008-09-14 09:15 > Neighbor.h > > Brad > > On Sun, 14 Sep 2008 06:23:53 -0700 (PDT), "Michiel de > Hoon" > said: > > Hi everybody, > > > > I just committed a bunch of changes to Bio.Cluster, > Bio.KDTree, and > > setup.py that deal with the old Numerical Python to > new NumPy conversion. > > With these changes, Biopython should compile with > NumPy; any remaining > > references to the old Numerical Python are at the > Python-level only. > > Since these are rather big changes, please try with > the current version > > of CVS to see if everything compiles cleanly and all > tests pass. > > Comments, questions, suggestions are welcome. > > > > I also uploaded a plain C (instead of C++) version of > Bio.KDTree, and > > adjusted setup.py accordingly. > > > > --Michiel. > > > > > > > > _______________________________________________ > > Biopython-dev mailing list > > Biopython-dev at lists.open-bio.org > > > http://lists.open-bio.org/mailman/listinfo/biopython-dev > -- > Brad Chapman > chapmanb at 50mail.com > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From bugzilla-daemon at portal.open-bio.org Tue Sep 16 16:04:20 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 16 Sep 2008 16:04:20 -0400 Subject: [Biopython-dev] [Bug 2583] small bug in NCBIXML.py In-Reply-To: Message-ID: <200809162004.m8GK4KKj016559@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2583 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-09-16 16:04 EST ------- I'd have looked at this earlier but was on holiday. I recall fixing a few similar issues in the past, but hadn't spotted these. I'll try and deal with this by the end of the week. Thanks Christen! Peter. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Sep 16 16:07:07 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 16 Sep 2008 16:07:07 -0400 Subject: [Biopython-dev] [Bug 2585] Error in Bio.SeqUtils.apply_on_multi_fasta In-Reply-To: Message-ID: <200809162007.m8GK77S2016681@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2585 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-09-16 16:07 EST ------- That does look like a bug - I'll have to look over the history to see how this was originally intended to be used as the current docstring isn't very clear. Another option would be something like: results.append(record.format("fasta")) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Sep 17 05:38:23 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 17 Sep 2008 05:38:23 -0400 Subject: [Biopython-dev] [Bug 2583] small bug in NCBIXML.py In-Reply-To: Message-ID: <200809170938.m8H9cN2Y024263@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2583 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-09-17 05:38 EST ------- Christen - what version of Biopython are you using? The reason the match/mis-match issue sounded so familiar to me is I fixed it in Biopython 1.46 after Sebastian Bassi reported it on the mailing list in March. If you can confirm you are using Biopython 1.45 or older, then could you try updating you machine? We should then be able to mark this bug as fixed. Thanks Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Sep 17 07:17:23 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 17 Sep 2008 07:17:23 -0400 Subject: [Biopython-dev] [Bug 2586] New version of MeltingTemp.py In-Reply-To: Message-ID: <200809171117.m8HBHNC9028796@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2586 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-09-17 07:17 EST ------- Updated checked in. Thanks Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Sep 17 07:34:34 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 17 Sep 2008 07:34:34 -0400 Subject: [Biopython-dev] [Bug 2585] Error in Bio.SeqUtils.apply_on_multi_fasta In-Reply-To: Message-ID: <200809171134.m8HBYYte029584@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2585 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-09-17 07:34 EST ------- This bug was introduced in CVS revision 1.13 of Bio/SeqUtils/__init__.py when moving from Bio.Fasta.RecordParser (which used Fasta record objects with a title property) to Bio.SeqIO (which uses SeqRecord objects instead). With hindsight this is a clear oversight (which also changed the usage of the function). Your fix looks fine for recovering some of the original behaviour. We should also clarify the docstrings of these (and other functions in this module) to make it explicit where the "file" argument should be a filename. However, I am tempted however to deprecate apply_on_multi_fasta and quicker_apply_on_multi_fasta (and some of the other code here) as to me using a Bio.SeqIO with a for loop is much clearer. e.g. def my_function ... for record in SeqIO.parse(open(filename), "fasta") : my_function(record) versus: def my_function ... apply_on_multi_fasta(filename, my_function) What do you think Sebastian? Did you have a real example for using apply_on_multi_fasta or did you happen to spot the bug? Thanks, Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Sep 17 08:19:40 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 17 Sep 2008 08:19:40 -0400 Subject: [Biopython-dev] [Bug 2585] Error in Bio.SeqUtils.apply_on_multi_fasta In-Reply-To: Message-ID: <200809171219.m8HCJebF031531@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2585 ------- Comment #3 from sbassi at gmail.com 2008-09-17 08:19 EST ------- (In reply to comment #2) > What do you think Sebastian? Did you have a real example for using > apply_on_multi_fasta or did you happen to spot the bug? I don't use this function myself and I also think it is redundant. I spotted it just because I am checking most biopython function for a book on python for bioinformatics I am writing. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Wed Sep 17 10:12:44 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 17 Sep 2008 10:12:44 -0400 Subject: [Biopython-dev] [Bug 2585] Error in Bio.SeqUtils.apply_on_multi_fasta In-Reply-To: Message-ID: <200809171412.m8HECiP2005746@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2585 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2008-09-17 10:12 EST ------- I've made your suggested fix in CVS, and added a docstring to this and the related functions. I've described them as obsolete but will also suggest their deprecation on the mailing list... Thanks for your report. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mjldehoon at yahoo.com Wed Sep 17 10:13:50 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Wed, 17 Sep 2008 07:13:50 -0700 (PDT) Subject: [Biopython-dev] Numpy conversion Message-ID: <998217.51422.qm@web62404.mail.re1.yahoo.com> Hi everybody, I am now looking at the pure-python modules that make use of Numerical Python / NumPy. Bio.kNN is one of them; this also happens to be the only module that imports Bio.distance, which also depends on NumPy. What I am not sure about is the usage of Bio.kNN. A quick google search didn't reveal much, suggesting that it is not widely used. Bio.kNN currently is not documented in the tutorial, but the code itself is reasonably well documented. How do you guys feel about this module? Should we keep it? --Michiel. From biopython at maubp.freeserve.co.uk Wed Sep 17 10:23:23 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 17 Sep 2008 15:23:23 +0100 Subject: [Biopython-dev] Cleaning up Bio.SeqUtils Message-ID: <320fb6e00809170723w95c87b0r94c4b12574ce174f@mail.gmail.com> Dear all, I've previously mentioned the idea of cleaning up Bio/SeqUtils/__init__.py in passing. I've been reminded about this by Bug 2585 where Sebastian spotted a problem in one of the FASTA related functions. http://bugzilla.open-bio.org/show_bug.cgi?id=2585 I've updated the docstrings in CVS to describe the three functions quick_FASTA_reader, apply_on_multi_fasta and quicker_apply_on_multi_fasta as obsolete but I would like to suggest going further and deprecating them. There are other dubious or redundant functions in Bio/SeqUtils/__init__.py such as a translate function. Again, would there be any objection to deprecating this too? Peter From biopython at maubp.freeserve.co.uk Wed Sep 17 10:29:35 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 17 Sep 2008 15:29:35 +0100 Subject: [Biopython-dev] Numpy conversion In-Reply-To: <998217.51422.qm@web62404.mail.re1.yahoo.com> References: <998217.51422.qm@web62404.mail.re1.yahoo.com> Message-ID: <320fb6e00809170729g49b97488l629c4132c99b44f0@mail.gmail.com> On Wed, Sep 17, 2008 at 3:13 PM, Michiel de Hoon wrote: > Hi everybody, > > I am now looking at the pure-python modules that make use of Numerical Python / NumPy. > Bio.kNN is one of them; this also happens to be the only module that imports Bio.distance, > which also depends on NumPy. > > What I am not sure about is the usage of Bio.kNN. A quick google search didn't reveal much, > suggesting that it is not widely used. Bio.kNN currently is not documented in the tutorial, but > the code itself is reasonably well documented. > > How do you guys feel about this module? Should we keep it? > I've not used it myself, but it sounds handy. Michiel, does this overlap at all with your clustering module? Peter From sbassi at gmail.com Wed Sep 17 18:46:23 2008 From: sbassi at gmail.com (Sebastian Bassi) Date: Wed, 17 Sep 2008 19:46:23 -0300 Subject: [Biopython-dev] Author of "restriction tutorial"? Message-ID: I want to cite in a book the Restriction tutorial (http://biopython.org/DIST/docs/cookbook/Restriction.html) so I need author(s) name(s). I can't find the author name so I ask here to cite it properly. Best, SB. -- Vendo isla: http://www.genesdigitales.com/isla/ Curso Biologia Molecular para programadores: http://tinyurl.com/2vv8w6 Bioinformatics news: http://www.bioinformatica.info Tutorial libre de Python: http://tinyurl.com/2az5d5 From bugzilla-daemon at portal.open-bio.org Wed Sep 17 23:15:51 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 17 Sep 2008 23:15:51 -0400 Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows file-path values In-Reply-To: Message-ID: <200809180315.m8I3Fpk9008139@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2480 ------- Comment #19 from robert.cadena at gmail.com 2008-09-17 23:15 EST ------- A quick fix to this might be to call a .bat file that calls the blastall executable. For example: blastall_wrapper.bat: "c:/Documents and Settings/maldoror/My Documents/blast/bin/blastall.exe" %1 %2 %3 %4 %5 %6 %7 %8 %9 All arguments containing spaces should be escaped with "\"[arg]". For example: my_blast_db should be r"\"\\\"c:/documents and settings/maldoror/my documents/blast/bin/mine\"" When the above value is printed out it should look like: "\"c:/documents ...." finally, set my_blastall_exe to the batch file: "\"c:/documents and settings .../blastall_wrapper.bat\"" You still have to deal with the problem that os.path.exists and os.system expect the command with and without quotes. but, at least the batch file wrapper method should pass the arguments properly. hope it works on your system. best of luck. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Thu Sep 18 05:58:04 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 18 Sep 2008 10:58:04 +0100 Subject: [Biopython-dev] Author of "restriction tutorial"? In-Reply-To: References: Message-ID: <320fb6e00809180258v49f43d4ej7c03551172c9c638@mail.gmail.com> On Wed, Sep 17, 2008 at 11:46 PM, Sebastian Bassi wrote: > I want to cite in a book the Restriction tutorial > (http://biopython.org/DIST/docs/cookbook/Restriction.html) so I need > author(s) name(s). > I can't find the author name so I ask here to cite it properly. > Best, > SB. It is a little surprising the author didn't include his name in the HTML document, but looking at CVS and the mailing list archives, I think this is by Frederic Sohm. http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Doc/cookbook/Restriction/?cvsroot=biopython http://portal.open-bio.org/pipermail/biopython/2005-February/002548.html (I recall reading an earlier thread where Frederic offered the package with documentation, but I haven't found it again). Peter From bugzilla-daemon at portal.open-bio.org Thu Sep 18 06:58:26 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 18 Sep 2008 06:58:26 -0400 Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows file-path values In-Reply-To: Message-ID: <200809181058.m8IAwQLw001437@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2480 ------- Comment #20 from biopython-bugzilla at maubp.freeserve.co.uk 2008-09-18 06:58 EST ------- I think I've fixed some of this in CVS. Biopython should now cope with a blast exe or input file with spaces in the name - but thus far I have only tested this on Mac OS X. See Bio/Blast/NCBIStandalone.py revision 1.77 in CVS. You will be able to look at the changes and download them via the following URL shortly: http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/Blast/NCBIStandalone.py?cvsroot=biopython Dealing with database(s) where the path(s) contain spaces is trickier. I think the best solution here is to setup BLAST so that it knows where to find your databases, and then you can refer to them by name only (no paths, therefore no spaces). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Thu Sep 18 08:15:50 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 18 Sep 2008 13:15:50 +0100 Subject: [Biopython-dev] Modules to be removed from Biopython In-Reply-To: <320fb6e00808080417y483f74c8xd94dd7ca9eea0476@mail.gmail.com> References: <492634.64872.qm@web62414.mail.re1.yahoo.com> <320fb6e00806270950k479eda23ia96d3c2d36557510@mail.gmail.com> <320fb6e00807160540w325fe995mea400b0014fd7c2e@mail.gmail.com> <320fb6e00807220913g64613854j7a1deb5b4357f726@mail.gmail.com> <320fb6e00808080417y483f74c8xd94dd7ca9eea0476@mail.gmail.com> Message-ID: <320fb6e00809180515g59e53bddoa1d83242df198a1@mail.gmail.com> I wrote: >> Bio.expressions was already deprecated, and seems to be a dependency >> of the following modules, which I have now explicitly deprecated in CVS: I plan to remove these four deprecated modules shortly, unless anyone objects: Bio.expressions (deprecated in Biopython 1.44) Bio.config (explicitly deprecated in Biopython 1.48) Bio.dbdefs (explicitly deprecated in Biopython 1.48) Bio.formatdefs (explicitly deprecated in Biopython 1.48) At the same time I would remove the associated bit of unused code in Bio/__init__.py Peter From mjldehoon at yahoo.com Thu Sep 18 10:10:49 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Thu, 18 Sep 2008 07:10:49 -0700 (PDT) Subject: [Biopython-dev] Numpy conversion In-Reply-To: <320fb6e00809170729g49b97488l629c4132c99b44f0@mail.gmail.com> Message-ID: <37659.57326.qm@web62402.mail.re1.yahoo.com> > I've not used it myself, but it sounds handy. Michiel, > does this overlap at all with your clustering module? No, it doesn't. Bio.Cluster contains unsupervised clustering methods only. The k-nearest neighbors in Bio.kNN is a supervised learning method. --Michiel. --- On Wed, 9/17/08, Peter wrote: > From: Peter > Subject: Re: [Biopython-dev] Numpy conversion > To: mjldehoon at yahoo.com > Cc: biopython-dev at biopython.org > Date: Wednesday, September 17, 2008, 10:29 AM > On Wed, Sep 17, 2008 at 3:13 PM, Michiel de Hoon > wrote: > > Hi everybody, > > > > I am now looking at the pure-python modules that make > use of Numerical Python / NumPy. > > Bio.kNN is one of them; this also happens to be the > only module that imports Bio.distance, > > which also depends on NumPy. > > > > What I am not sure about is the usage of Bio.kNN. A > quick google search didn't reveal much, > > suggesting that it is not widely used. Bio.kNN > currently is not documented in the tutorial, but > > the code itself is reasonably well documented. > > > > How do you guys feel about this module? Should we keep > it? > > > > I've not used it myself, but it sounds handy. Michiel, > does this > overlap at all with your clustering module? > > Peter From biopython at maubp.freeserve.co.uk Thu Sep 18 11:00:16 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 18 Sep 2008 16:00:16 +0100 Subject: [Biopython-dev] test_MarkovModel.py and Numeric/numpy work? Message-ID: <320fb6e00809180800ic2752dfoe00801f67b57c65c@mail.gmail.com> Hi all, I'm back from my one week holiday, and after I updated my machine I'm seeing a new failure in test_MarkovModel.py is probably related to the Numeric/numpy work, $ python run_tests.py test_MarkovModel.py test_MarkovModel ... ERROR (output cut) $python test_MarkovModel.py TESTING train_visible Training HMM Classifying [(['0', '0', '1', '2', '3', '3'], 0.0082128906250000053)] STATES: 0 1 2 3 ALPHABET: A C G T INITIAL: 0: 1.00 1: 0.00 2: 0.00 3: 0.00 TRANSITION: 0: 0.20 0.80 0.00 0.00 1: 0.00 0.50 0.50 0.00 2: 0.00 0.00 0.50 0.50 3: 0.00 0.00 0.00 1.00 EMISSION: 0: 0.67 0.11 0.11 0.11 1: 0.08 0.75 0.08 0.08 2: 0.08 0.08 0.75 0.08 3: 0.03 0.03 0.03 0.91 TESTING baum welch Training HMM Traceback (most recent call last): File "test_MarkovModel.py", line 64, in p_emission=p_emission File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/Bio/MarkovModel.py", line 181, in _baum_welch if not p_initial.any(): AttributeError: any $ python Python 2.5.2 (r252:60911, Feb 22 2008, 07:57:53) [GCC 4.0.1 (Apple Computer, Inc. build 5363)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import Numeric >>> Numeric.__version__ '24.2' >>> import numpy Traceback (most recent call last): File "", line 1, in ImportError: No module named numpy >>> The above is on Mac OS X 10.5 (Tiger) with Numeric installed, but not numpy. I see something similar but slightly different on a Linux machine with both Numeric and an old version of numpy. Looking at the CVS log, I wonder if this is due to the switch from an array based or, to an if based manipulation of p_initial, p_transition and p_emission? http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/MarkovModel.py.diff?r1=1.3&r2=1.4&cvsroot=biopython Peter From bugzilla-daemon at portal.open-bio.org Thu Sep 18 11:21:59 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 18 Sep 2008 11:21:59 -0400 Subject: [Biopython-dev] [Bug 2588] New: tutorial blast section uses undefined variables Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2588 Summary: tutorial blast section uses undefined variables Product: Biopython Version: 1.48 Platform: Other OS/Version: All Status: NEW Severity: trivial Priority: P2 Component: Documentation AssignedTo: biopython-dev at biopython.org ReportedBy: bsouthey at gmail.com Tutorial Section 6.6.2 'Parsing a file full of BLAST runs' has line: >>> blast_iterator = NCBIStandalone.Iterator(blast_handle, blast_parser) but 'blast_handle' is undefined. This line should probably be: >>> blast_iterator = NCBIStandalone.Iterator(result_handle, blast_parser) where result_handle is define in Section 6.6.1 'Parsing plain-text BLAST output': >>> result_handle = open("my_file_of_blast_output.txt") Also: >>> for b_record in b_iterator : probably should be: >>> for b_record in blast_iterator : -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From p.j.a.cock at googlemail.com Thu Sep 18 11:25:16 2008 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Thu, 18 Sep 2008 16:25:16 +0100 Subject: [Biopython-dev] Numeric/numpy - Bio/Affy/CelFile.py Message-ID: <320fb6e00809180825q796a7a1ay7fda222a77de678@mail.gmail.com> Michiel & Brad, It was my impression that for the next release of Biopython (or next few releases?) we would support either numpy or Numeric (decided at compile time for the C code, but at run time for pure-python modules). I notice that with CVS revision 1.5 of Bio/Affy/CelFile.py, this file only uses numpy (dropping support for Numeric). http://code.open-bio.org/cgi/viewcvs.cgi/biopython/Bio/Affy/CelFile.py.diff?r1=1.4&r2=1.5&cvsroot=biopython Was this just an oversight, or to resolve some incompatibility between Numeric and numpy? It would be nice to support both... Peter From bugzilla-daemon at portal.open-bio.org Thu Sep 18 11:33:39 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 18 Sep 2008 11:33:39 -0400 Subject: [Biopython-dev] [Bug 2588] tutorial blast section uses undefined variables In-Reply-To: Message-ID: <200809181533.m8IFXde1017191@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2588 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-09-18 11:33 EST ------- Well spotted! I've fixed those in CVS, plus made b_record into blast_record for consistency with the rest of the chapter. See biopython/Doc/Tutorial.tex CVS revision 1.159 http://code.open-bio.org/cgi/viewcvs.cgi/biopython/Doc/Tutorial.tex?cvsroot=biopython Thanks Bruce, Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Sep 18 17:49:02 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 18 Sep 2008 17:49:02 -0400 Subject: [Biopython-dev] [Bug 2589] New: Errors in running tests in 1.48 Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2589 Summary: Errors in running tests in 1.48 Product: Biopython Version: 1.48 Platform: PC OS/Version: Linux Status: NEW Severity: minor Priority: P2 Component: Unit Tests AssignedTo: biopython-dev at biopython.org ReportedBy: bsouthey at gmail.com I downloaded BioPython 1.48 on x64 Linux fedora rawhide (kernel 2.6.27-0.329.rc6.git2.fc10.x86_64) $python setup.py build $python setup.py test The test that fails is: ERROR: test_MarkovModel ---------------------------------------------------------------------- Traceback (most recent call last): File "run_tests.py", line 152, in runTest self.runSafeTest() File "run_tests.py", line 165, in runSafeTest cur_test = __import__(self.test_name) File "test_MarkovModel.py", line 61, in p_emission=p_emission File "/home/bsouthey/bioinfo/biopython-1.48/build/lib.linux-x86_64-2.5/Bio/MarkovModel.py", line 199, in _baum_welch lpseudo_initial, lpseudo_transition, lpseudo_emission,) File "/home/bsouthey/bioinfo/biopython-1.48/build/lib.linux-x86_64-2.5/Bio/MarkovModel.py", line 255, in _baum_welch_one lp_initial[:] = lp_arcout_t[:,0] ValueError: matrices are not aligned for copy Also these two errors with no explanation because Fdist and SimCoal are not included as required or optional software. Should be added to the list. test_PopGen_FDist ... skipping. Fdist not found (not a problem if you do not intend to use it). test_PopGen_SimCoal ... skipping. SimCoal not found (not a problem if you do not intend to use it). No explanation of what this is: test_GFF ... skipping. Environment is not configured for this test (not important if you do not plan to use Bio.GFF). I know these ones because MySQL is not installed but this test should be cleaner especially since this is involves optional software: test_BioSQL ... skipping. test_BioSQL_SeqIO ... skipping. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Sep 18 18:33:36 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 18 Sep 2008 18:33:36 -0400 Subject: [Biopython-dev] [Bug 2589] Errors in running tests in 1.48 In-Reply-To: Message-ID: <200809182233.m8IMXaai020481@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2589 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-09-18 18:33 EST ------- Hi Bruce, The test_MarkovModel problem looks serious, but improvements to the messages from skipped tests are also worthwhile. test_MarkovModel ================ This is interesting, I've not seen this before. What version of Numeric do you have? You can find out at the python prompt with: import Numeric print Numeric.__version__ test_PopGen_FDist and test_PopGen_SimCoal ========================================= These are 3rd party population genetics tools. Do you think they should be listed under http://biopython.org/wiki/Download#Optional_Software test_GFF ======== This unit test requires a GFF wormbase MySQL database to be setup, plus an environment variable for the password. This is fairly complicated to explain, hence "Environment is not configured for this test (not important if you do not plan to use Bio.GFF)." test_BioSQL and test_BioSQL_SeqIO ================================= These require a BioSQL database with python driver to be installed plus the username and password etc to be given in setup_BioSQL.py. What message did you get exactly, and how would you suggest improving the message given? Thanks, Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Sep 18 18:45:00 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 18 Sep 2008 18:45:00 -0400 Subject: [Biopython-dev] [Bug 2589] Errors in running tests in 1.48 In-Reply-To: Message-ID: <200809182245.m8IMj0NI022825@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2589 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-09-18 18:45 EST ------- (In reply to comment #1) > test_PopGen_FDist and test_PopGen_SimCoal > ========================================= > These are 3rd party population genetics tools. Do you think they should > be listed under http://biopython.org/wiki/Download#Optional_Software I've added these on the wiki (and split the list into sections). Is that better now? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Sep 18 18:52:27 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 18 Sep 2008 18:52:27 -0400 Subject: [Biopython-dev] [Bug 2251] [PATCH] NumPy support for BioPython In-Reply-To: Message-ID: <200809182252.m8IMqRTk024233@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2251 ------- Comment #15 from biopython-bugzilla at maubp.freeserve.co.uk 2008-09-18 18:52 EST ------- For those not following the dev-mailing list, Numeric to numpy changes have begun to be checked into CVS. Brad said he had used Ed's patch for a lot of this - so thanks Ed! -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Thu Sep 18 23:01:37 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 18 Sep 2008 23:01:37 -0400 Subject: [Biopython-dev] [Bug 2589] Errors in running tests in 1.48 In-Reply-To: Message-ID: <200809190301.m8J31bmd012377@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2589 ------- Comment #3 from bsouthey at gmail.com 2008-09-18 23:01 EST ------- (In reply to comment #1) > Hi Bruce, > > The test_MarkovModel problem looks serious, but improvements to the messages > from skipped tests are also worthwhile. > > test_MarkovModel > ================ > This is interesting, I've not seen this before. What version of Numeric do you > have? You can find out at the python prompt with: > > import Numeric > print Numeric.__version__ 24.2 Based on a Google search, this is a 64bit problem with Python 2.5 and Numeric. So either do: 1) Drop the [:] from the left-hand side: lp_initial = lp_arcout_t[:,0] 2) Do a loop: for bi in range(lp_initial.shape[0]): lp_initial[bi] = lp_arcout_t[bi,0] 3) Support NumPy - oh, wait already done... :-) > > test_PopGen_FDist and test_PopGen_SimCoal > ========================================= > These are 3rd party population genetics tools. Do you think they should be > listed under http://biopython.org/wiki/Download#Optional_Software Excellent! It is also good promo on what BioPython can do. > > test_GFF > ======== > This unit test requires a GFF wormbase MySQL database to be setup, plus an > environment variable for the password. This is fairly complicated to explain, > hence "Environment is not configured for this test (not important if you do not > plan to use Bio.GFF)." I did not see where GFF is mentioned so a link would be worthwhile. Also, this is probably the wrong place for the test or it should not be referenced unless asked. > > test_BioSQL and test_BioSQL_SeqIO > ================================= > These require a BioSQL database with python driver to be installed plus the > username and password etc to be given in setup_BioSQL.py. What message did you > get exactly, and how would you suggest improving the message given? test_BioSQL ... skipping. Connection failed, check settings in Tests/setup_BioSQL.py if you plan to use BioSQL: (2002, "Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)") ok test_BioSQL_SeqIO ... skipping. Connection failed, check settings in Tests/setup_BioSQL.py if you plan to use BioSQL: (2002, "Can't connect to local MySQL server through socket '/var/lib/mysql/mysql.sock' (2)") ok I initially thought to note these in the installation but after looking at the BioSQL page, these MySQL tests should not be run to test BioPython. These are BioSQL tests such they should be run after MySQL and BioSQL have been setup. So these should not be tested unless asked for. > > Thanks, > > Peter > No, thanks to all the developers as this is too minor. Bruce -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mjldehoon at yahoo.com Fri Sep 19 09:05:31 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 19 Sep 2008 06:05:31 -0700 (PDT) Subject: [Biopython-dev] Numeric/numpy - Bio/Affy/CelFile.py In-Reply-To: <320fb6e00809180825q796a7a1ay7fda222a77de678@mail.gmail.com> Message-ID: <78171.17163.qm@web62406.mail.re1.yahoo.com> Actually, I was under the impression that the latest consensus was to go to NumPy directly. It's quite complicated to support both NumPy and Numerical Python, at least at the C level. --Michiel. --- On Thu, 9/18/08, Peter Cock wrote: > From: Peter Cock > Subject: [Biopython-dev] Numeric/numpy - Bio/Affy/CelFile.py > To: "BioPython-Dev Mailing List" > Date: Thursday, September 18, 2008, 11:25 AM > Michiel & Brad, > > It was my impression that for the next release of Biopython > (or next > few releases?) we would support either numpy or Numeric > (decided at > compile time for the C code, but at run time for > pure-python modules). > > I notice that with CVS revision 1.5 of Bio/Affy/CelFile.py, > this file > only uses numpy (dropping support for Numeric). > http://code.open-bio.org/cgi/viewcvs.cgi/biopython/Bio/Affy/CelFile.py.diff?r1=1.4&r2=1.5&cvsroot=biopython > > Was this just an oversight, or to resolve some > incompatibility between > Numeric and numpy? It would be nice to support both... > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From p.j.a.cock at googlemail.com Fri Sep 19 09:57:19 2008 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 19 Sep 2008 14:57:19 +0100 Subject: [Biopython-dev] Numeric/numpy - Bio/Affy/CelFile.py In-Reply-To: <78171.17163.qm@web62406.mail.re1.yahoo.com> References: <320fb6e00809180825q796a7a1ay7fda222a77de678@mail.gmail.com> <78171.17163.qm@web62406.mail.re1.yahoo.com> Message-ID: <320fb6e00809190657j662b8824n6be5ac593c13aaef@mail.gmail.com> On Fri, Sep 19, 2008 at 2:05 PM, Michiel de Hoon wrote: > Actually, I was under the impression that the latest consensus was to go to NumPy directly. It's quite complicated to support both NumPy and Numerical Python, at least at the C level. I was assuming dual support for both numpy or Numeric for the next release based on code like this: try: from Numeric import x, y, z except ImportError: from numpy.oldnumeric import x, y, z where I assumed the C code would have been decided at compile time. If a simple switch from Numeric to numpy is what you and Brad had in mind, that's OK with me but in the python code we should just use simple imports from numpy only. Peter From mjldehoon at yahoo.com Fri Sep 19 11:03:47 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 19 Sep 2008 08:03:47 -0700 (PDT) Subject: [Biopython-dev] Numeric/numpy - Bio/Affy/CelFile.py In-Reply-To: <320fb6e00809190657j662b8824n6be5ac593c13aaef@mail.gmail.com> Message-ID: <828325.7957.qm@web62406.mail.re1.yahoo.com> > try: > from Numeric import x, y, z > except ImportError: > from numpy.oldnumeric import x, y, z This is the easy part. Keep in mind though that the "from numpy.oldnumeric import x, y, z" approach is only a temporary solution; at some point, the oldnumeric wrapper will disappear from numpy. > where I assumed the C code would have been decided at > compile time. This is the complicated part; it's not just replacing one #include with another. We'd have to use a bunch of #ifdefs to separate the old code from the new code. Anyway I was planning to go through the Numerical Python - dependent code to see if any other changes are needed. If anybody wants to be able to use the old Numerical Python, please let yourself be heard; otherwise I suggest we go directly to NumPy. --Michiel --- On Fri, 9/19/08, Peter Cock wrote: > From: Peter Cock > Subject: Re: [Biopython-dev] Numeric/numpy - Bio/Affy/CelFile.py > To: mjldehoon at yahoo.com > Cc: "BioPython-Dev Mailing List" > Date: Friday, September 19, 2008, 9:57 AM > On Fri, Sep 19, 2008 at 2:05 PM, Michiel de Hoon > wrote: > > Actually, I was under the impression that the latest > consensus was to go to NumPy directly. It's quite > complicated to support both NumPy and Numerical Python, at > least at the C level. > > I was assuming dual support for both numpy or Numeric for > the next > release based on code like this: > > try: > from Numeric import x, y, z > except ImportError: > from numpy.oldnumeric import x, y, z > > where I assumed the C code would have been decided at > compile time. > > If a simple switch from Numeric to numpy is what you and > Brad had in > mind, that's OK with me but in the python code we > should just use > simple imports from numpy only. > > Peter From p.j.a.cock at googlemail.com Fri Sep 19 11:42:26 2008 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Fri, 19 Sep 2008 16:42:26 +0100 Subject: [Biopython-dev] Numeric/numpy Message-ID: <320fb6e00809190842i6583f7bard82b03d5ea36f51e@mail.gmail.com> Michiel wrote: >Peter wrote: >> I was assuming dual support for both numpy or Numeric for the next >> release based on code like this: >> >> try: >> from Numeric import x, y, z >> except ImportError: >> from numpy.oldnumeric import x, y, z > > This is the easy part. Keep in mind though that the "from numpy.oldnumeric import x, y, z" approach is only a temporary solution; at some point, the oldnumeric wrapper will disappear from numpy. Yes, if/when the oldnumeric wrapper goes away we'll have more work to do. Something to worry about later. >> where I assumed the C code would have been decided at >> compile time. > > This is the complicated part; it's not just replacing one #include with another. We'd have to use a bunch of #ifdefs to separate the old code from the new code. > > Anyway I was planning to go through the Numerical Python - dependent code to see if any other > changes are needed. If anybody wants to be able to use the old Numerical Python, please let > yourself be heard; otherwise I suggest we go directly to NumPy. > > --Michiel That suits me - how about we post something like this on the main discussion list then?: Dear all, As you probably are well aware, Biopython releases to date have used the now obsolete Numeric python library. This is no longer being maintained and has been superseded by the numpy library. See http://www.scipy.org/History_of_SciPy for more about details on the history of numerical python. Biopython 1.48 should be the last Numeric only release of Biopython - we have already started moving to numpy in CVS. Supporting both Numeric and numpy ought to be fairly straight forward for the pure python modules in Biopython. However, we also have C code which must interact with Numeric/numpy, and trying to support both would be harder. Would anyone be inconvenienced if the next release of Biopython supported numpy ONLY (dropping support for Numeric)? If so please speak up now - either here or on the development mailing list. Otherwise, a simple switch from Numeric to numpy will probably be the most straightforward migration plan. Thank you, ... From bugzilla-daemon at portal.open-bio.org Fri Sep 19 14:26:17 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 19 Sep 2008 14:26:17 -0400 Subject: [Biopython-dev] [Bug 2591] New: GenBank files misparsed for long organism names Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2591 Summary: GenBank files misparsed for long organism names Product: Biopython Version: 1.47 Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: joelb at lanl.gov I've noticed a problem with BioPython 1.47 mis-parsing the organism and lineage in GenBank files from certain bacteria. All of the problem organisms have names longer than 61 characters, and a line wrap is introduced into the SOURCE and ORGANISM records, which causes the mis-parsing. My reading of the GenBank file docs says that lines should be of variable length rather than being split, so it appears this bug is GenBank's problem rather than BioPython's. I have sent e-mail to info at ncbi.nlm.nih.gov about the issue just now. GenBank doesn't seem to have a bug tracker, though, so I'm writing the issue here to document it for other people. The issue exists for a number of organisms (more than 6, though I haven't done the exact count). One example may be found at ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/Salmonella_enterica_serovar_Paratyphi_A_AKU_12601/NC_011147.gbk or http://tinyurl.com/47yg5g When parsing this file, the taxonomy list returned begins with ["AKU_12601 Bacteria","Proteobacteria"... Some of the other examples have made it onto web sites which have included the mis-parsed data, e.g. Superfam http://supfam.mrc-lmb.cam.ac.uk/SUPERFAMILY/cgi-bin/gen_list.cgi?genome=x6 which shows the error for Salmonella enterica subsp. enterica serovar Choleraesuis str. SC-B67. I'll append the response from GenBank to this bug if and when I get one. If I don't get one, then I'll try to come up with a workaround. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Sep 19 15:05:45 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 19 Sep 2008 15:05:45 -0400 Subject: [Biopython-dev] [Bug 2591] GenBank files misparsed for long organism names In-Reply-To: Message-ID: <200809191905.m8JJ5jUY028741@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2591 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-09-19 15:05 EST ------- That file starts as follows: LOCUS NC_011147 4581797 bp DNA circular BCT 29-AUG-2008 DEFINITION Salmonella enterica subsp. enterica serovar Paratyphi A str. AKU_12601, complete genome. ACCESSION NC_011147 VERSION NC_011147.1 GI:197361212 KEYWORDS complete genome. SOURCE Salmonella enterica subsp. enterica serovar Paratyphi A str. AKU_12601 ORGANISM Salmonella enterica subsp. enterica serovar Paratyphi A str. AKU_12601 Bacteria; Proteobacteria; Gammaproteobacteria; Enterobacteriales; Enterobacteriaceae; Salmonella. REFERENCE 1 ... The multiline DEFINITION and SOURCE should be fine. However, we expect ORGANISM to be a single line followed by a multiline taxonomy lineage - hense the problem you observed. This may well be an NCBI bug but it seems likely this kind of problem will occur more often in future as more and more (sub)strains of bacteria are sequenced, requiring longer names. Let's wait and hear what the NCBI says - I expect they will have to change the file format definition slightly. If they say this is a valid file, I hope they will also explain officially how we should split up the species and its lineage. One option would be some thing like looking for semi-colons in the following text as indicative of the lineage (rather than as more of the ORGANISM). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From chapmanb at 50mail.com Fri Sep 19 18:34:20 2008 From: chapmanb at 50mail.com (Brad Chapman) Date: Fri, 19 Sep 2008 18:34:20 -0400 Subject: [Biopython-dev] Numeric/numpy In-Reply-To: <320fb6e00809190842i6583f7bard82b03d5ea36f51e@mail.gmail.com> References: <320fb6e00809190842i6583f7bard82b03d5ea36f51e@mail.gmail.com> Message-ID: <20080919223420.GA13009@localdomain> Peter; Michiel covered most everything here. My initial check-ins are basically the try/except you describe and it looks like Michiel has gone further and worked on real NumPy transitions. My opinion is to post that message to the main list and move forward with converting to MumPy exclusively as people are able to tackle the task for different module. Once something has been converted in a real way, and not just using oldnumeric imports, then the try/except can go away. I suspect not too many people will still be stuck on Numerical, and should be excited to get up to date with that library. Brad > Michiel wrote: > >Peter wrote: > >> I was assuming dual support for both numpy or Numeric for the next > >> release based on code like this: > >> > >> try: > >> from Numeric import x, y, z > >> except ImportError: > >> from numpy.oldnumeric import x, y, z > > > > This is the easy part. Keep in mind though that the "from numpy.oldnumeric import x, y, z" approach is only a temporary solution; at some point, the oldnumeric wrapper will disappear from numpy. > > Yes, if/when the oldnumeric wrapper goes away we'll have more work to > do. Something to worry about later. > > >> where I assumed the C code would have been decided at > >> compile time. > > > > This is the complicated part; it's not just replacing one #include with another. We'd have to use a bunch of #ifdefs to separate the old code from the new code. > > > > Anyway I was planning to go through the Numerical Python - dependent code to see if any other > > changes are needed. If anybody wants to be able to use the old Numerical Python, please let > > yourself be heard; otherwise I suggest we go directly to NumPy. > > > > --Michiel > > That suits me - how about we post something like this on the main > discussion list then?: > > Dear all, > > As you probably are well aware, Biopython releases to date have used > the now obsolete Numeric python library. This is no longer being > maintained and has been superseded by the numpy library. See > http://www.scipy.org/History_of_SciPy for more about details on the > history of numerical python. Biopython 1.48 should be the last > Numeric only release of Biopython - we have already started moving to > numpy in CVS. > > Supporting both Numeric and numpy ought to be fairly straight forward > for the pure python modules in Biopython. However, we also have C code > which must interact with Numeric/numpy, and trying to support both > would be harder. > > Would anyone be inconvenienced if the next release of Biopython > supported numpy ONLY (dropping support for Numeric)? If so please > speak up now - either here or on the development mailing list. > Otherwise, a simple switch from Numeric to numpy will probably be the > most straightforward migration plan. > > Thank you, > > ... > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev -- Brad Chapman Codon Devices http://www.codondevices.com From mjldehoon at yahoo.com Fri Sep 19 23:00:09 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 19 Sep 2008 20:00:09 -0700 (PDT) Subject: [Biopython-dev] test_MarkovModel.py and Numeric/numpy work? In-Reply-To: <320fb6e00809180800ic2752dfoe00801f67b57c65c@mail.gmail.com> Message-ID: <200283.98234.qm@web62404.mail.re1.yahoo.com> This is an example where the old Numerical Python and the new NumPy need different code at the Python-level. The Numerical Python -dependent code was: p_initial = _safe_copy_and_check(p_initial, (N,)) or _random_norm(N) Here, _safe_copy_and_check returns an array. NumPy does not allow interpreting an array as a boolean, so instead we have to use: p_initial = _safe_copy_and_check(p_initial, (N,)) if not p_initial.any(): p_initial = _random_norm(N) which is the code Brad uploaded to CVS. However, Numerical Python arrays don't have the .any() method, so this fails with the old Numerical Python. Let's first see if anybody wants to continue using the old Numerical Python. If so, we can add some try:except: around the call to p_initial.any(). If not, then Brad's code is fine. --Michiel. --- On Thu, 9/18/08, Peter wrote: > From: Peter > Subject: [Biopython-dev] test_MarkovModel.py and Numeric/numpy work? > To: "BioPython-Dev Mailing List" > Date: Thursday, September 18, 2008, 11:00 AM > Hi all, > > I'm back from my one week holiday, and after I updated > my machine I'm > seeing a new failure in test_MarkovModel.py is probably > related to the > Numeric/numpy work, > > $ python run_tests.py test_MarkovModel.py > test_MarkovModel ... ERROR > (output cut) > > $python test_MarkovModel.py > TESTING train_visible > Training HMM > Classifying > [(['0', '0', '1', '2', > '3', '3'], 0.0082128906250000053)] > STATES: 0 1 2 3 > ALPHABET: A C G T > INITIAL: > 0: 1.00 > 1: 0.00 > 2: 0.00 > 3: 0.00 > TRANSITION: > 0: 0.20 0.80 0.00 0.00 > 1: 0.00 0.50 0.50 0.00 > 2: 0.00 0.00 0.50 0.50 > 3: 0.00 0.00 0.00 1.00 > EMISSION: > 0: 0.67 0.11 0.11 0.11 > 1: 0.08 0.75 0.08 0.08 > 2: 0.08 0.08 0.75 0.08 > 3: 0.03 0.03 0.03 0.91 > TESTING baum welch > Training HMM > Traceback (most recent call last): > File "test_MarkovModel.py", line 64, in > > p_emission=p_emission > File > "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/Bio/MarkovModel.py", > line 181, in _baum_welch > if not p_initial.any(): > AttributeError: any > > $ python > Python 2.5.2 (r252:60911, Feb 22 2008, 07:57:53) > [GCC 4.0.1 (Apple Computer, Inc. build 5363)] on darwin > Type "help", "copyright", > "credits" or "license" for more > information. > >>> import Numeric > >>> Numeric.__version__ > '24.2' > >>> import numpy > Traceback (most recent call last): > File "", line 1, in > ImportError: No module named numpy > >>> > > The above is on Mac OS X 10.5 (Tiger) with Numeric > installed, but not > numpy. I see something similar but slightly different on a > Linux > machine with both Numeric and an old version of numpy. > > Looking at the CVS log, I wonder if this is due to the > switch from an > array based or, to an if based manipulation of p_initial, > p_transition > and p_emission? > http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/MarkovModel.py.diff?r1=1.3&r2=1.4&cvsroot=biopython > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From mjldehoon at yahoo.com Fri Sep 19 23:01:18 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Fri, 19 Sep 2008 20:01:18 -0700 (PDT) Subject: [Biopython-dev] Numeric/numpy In-Reply-To: <320fb6e00809190842i6583f7bard82b03d5ea36f51e@mail.gmail.com> Message-ID: <415448.81411.qm@web62403.mail.re1.yahoo.com> OK, I'll send your message to the biopython mailing list. --Michiel. --- On Fri, 9/19/08, Peter Cock wrote: > From: Peter Cock > Subject: Re: [Biopython-dev] Numeric/numpy > To: mjldehoon at yahoo.com > Cc: "BioPython-Dev Mailing List" > Date: Friday, September 19, 2008, 11:42 AM > Michiel wrote: > >Peter wrote: > >> I was assuming dual support for both numpy or > Numeric for the next > >> release based on code like this: > >> > >> try: > >> from Numeric import x, y, z > >> except ImportError: > >> from numpy.oldnumeric import x, y, z > > > > This is the easy part. Keep in mind though that the > "from numpy.oldnumeric import x, y, z" approach is > only a temporary solution; at some point, the oldnumeric > wrapper will disappear from numpy. > > Yes, if/when the oldnumeric wrapper goes away we'll > have more work to > do. Something to worry about later. > > >> where I assumed the C code would have been decided > at > >> compile time. > > > > This is the complicated part; it's not just > replacing one #include with another. We'd have to use a > bunch of #ifdefs to separate the old code from the new code. > > > > Anyway I was planning to go through the Numerical > Python - dependent code to see if any other > > changes are needed. If anybody wants to be able to use > the old Numerical Python, please let > > yourself be heard; otherwise I suggest we go directly > to NumPy. > > > > --Michiel > > That suits me - how about we post something like this on > the main > discussion list then?: > > Dear all, > > As you probably are well aware, Biopython releases to date > have used > the now obsolete Numeric python library. This is no longer > being > maintained and has been superseded by the numpy library. > See > http://www.scipy.org/History_of_SciPy for more about > details on the > history of numerical python. Biopython 1.48 should be the > last > Numeric only release of Biopython - we have already started > moving to > numpy in CVS. > > Supporting both Numeric and numpy ought to be fairly > straight forward > for the pure python modules in Biopython. However, we also > have C code > which must interact with Numeric/numpy, and trying to > support both > would be harder. > > Would anyone be inconvenienced if the next release of > Biopython > supported numpy ONLY (dropping support for Numeric)? If so > please > speak up now - either here or on the development mailing > list. > Otherwise, a simple switch from Numeric to numpy will > probably be the > most straightforward migration plan. > > Thank you, > > ... From biopython at maubp.freeserve.co.uk Sat Sep 20 07:31:20 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Sat, 20 Sep 2008 12:31:20 +0100 Subject: [Biopython-dev] Modules to be removed from Biopython In-Reply-To: <320fb6e00809180515g59e53bddoa1d83242df198a1@mail.gmail.com> References: <492634.64872.qm@web62414.mail.re1.yahoo.com> <320fb6e00806270950k479eda23ia96d3c2d36557510@mail.gmail.com> <320fb6e00807160540w325fe995mea400b0014fd7c2e@mail.gmail.com> <320fb6e00807220913g64613854j7a1deb5b4357f726@mail.gmail.com> <320fb6e00808080417y483f74c8xd94dd7ca9eea0476@mail.gmail.com> <320fb6e00809180515g59e53bddoa1d83242df198a1@mail.gmail.com> Message-ID: <320fb6e00809200431h2ace4e4dge0cc9835e8d8d53f@mail.gmail.com> On Thu, Sep 18, 2008 at 1:15 PM, Peter wrote: > I wrote: >>> Bio.expressions was already deprecated, and seems to be a dependency >>> of the following modules, which I have now explicitly deprecated in CVS: > > I plan to remove these four deprecated modules shortly, unless anyone objects: > > Bio.expressions (deprecated in Biopython 1.44) > Bio.config (explicitly deprecated in Biopython 1.48) > Bio.dbdefs (explicitly deprecated in Biopython 1.48) > Bio.formatdefs (explicitly deprecated in Biopython 1.48) > > At the same time I would remove the associated bit of unused code in > Bio/__init__.py Done in CVS now. Peter From biopython at maubp.freeserve.co.uk Mon Sep 22 09:46:48 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 22 Sep 2008 14:46:48 +0100 Subject: [Biopython-dev] test_MarkovModel.py and Numeric/numpy work? In-Reply-To: <200283.98234.qm@web62404.mail.re1.yahoo.com> References: <320fb6e00809180800ic2752dfoe00801f67b57c65c@mail.gmail.com> <200283.98234.qm@web62404.mail.re1.yahoo.com> Message-ID: <320fb6e00809220646s1a1ad59dvb83990c69402345e@mail.gmail.com> On Sat, Sep 20, 2008 at 4:00 AM, Michiel de Hoon wrote: > > This is an example where the old Numerical Python and the new NumPy need different code at the Python-level. > ... Thanks for the explanation :) > Let's first see if anybody wants to continue using the old Numerical Python. > If so, we can add some try:except: around the call to p_initial.any(). If not, > then Brad's code is fine. I've added a try/except to stop the failing unit test when Numeric is installed. This should now work with either Numeric or numpy. Having the current fall back import system (trying to import Numeric, falling back on importing numpy) makes sense for transition releases with support for both. However, if we all agree to do a straight switch from Numeric to numpy for Biopython 1.49, then I think we shouldn't try importing from Numeric at all. Peter From biopython at maubp.freeserve.co.uk Mon Sep 22 10:32:59 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 22 Sep 2008 15:32:59 +0100 Subject: [Biopython-dev] BOSC 2008 presentation Message-ID: <320fb6e00809220732x26350cb2ie71051fd15c2770e@mail.gmail.com> Peter wrote: >>> >>> This reminds me that I could/should make a PDF version of the BOSC >>> 2008 slides to go online here: >>> http://biopython.org/wiki/Documentation#Presentations >>> I've managed to turn the powerpoint version of the Biopython BOSC 2008 talk into a PDF file which is now online. I had to tweak some font settings (powerpoint on the Mac doesn't show things exactly as it does on a PC), but this should match up with the version on slideshare. If anyone spots any mistakes or discrepancies worth fixing, please let me know. Peter From bugzilla-daemon at portal.open-bio.org Mon Sep 22 10:54:35 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 22 Sep 2008 10:54:35 -0400 Subject: [Biopython-dev] [Bug 2592] New: numpy migration for Bio.PDB.Vector Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2592 Summary: numpy migration for Bio.PDB.Vector Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Other AssignedTo: biopython-dev at biopython.org ReportedBy: meesters at uni-mainz.de see http://lists.open-bio.org/pipermail/biopython/2008-September/004505.html The code is pretty similar to the original one. I don't mind, if it won't be used. Vector.py: from scipy.linalg import det #determinant from numpy import allclose, arccos, array, cos, dot, eye, float32, matrix, sin, \ sqrt, sum, trace, transpose, zeros from math import acos def m2rotaxis(m): """ Return angles, axis pair that corresponds to rotation matrix m. """ # Angle always between 0 and pi # Sense of rotation is defined by axis orientation t=0.5*(trace(m)-1) t=max(-1, t) t=min(1, t) angle=acos(t) if angle<1e-15: # Angle is 0 return 0.0, Vector(1,0,0) elif anglem11 and m00>m22: x=sqrt(m00-m11-m22+0.5) y=m[0,1]/(2*x) z=m[0,2]/(2*x) elif m11>m00 and m11>m22: y=sqrt(m11-m00-m22+0.5) x=m[0,1]/(2*y) z=m[1,2]/(2*y) else: z=sqrt(m22-m00-m11+0.5) x=m[0,2]/(2*z) y=m[1,2]/(2*z) axis=Vector(x,y,z) axis.normalize() return pi, axis def vector_to_axis(line, point): """ Returns the vector between a point and the closest point on a line (ie. the perpendicular projection of the point on the line). @type line: L{Vector} @param line: vector defining a line @type point: L{Vector} @param point: vector defining the point """ line=line.normalized() np=point.norm() angle=line.angle(point) return point-line**(np*cos(angle)) def calc_angle(v1, v2, v3): """ Calculate the angle between 3 vectors representing 3 connected points. @param v1, v2, v3: the tree points that define the angle @type v1, v2, v3: L{Vector} @return: angle @rtype: float """ v1=v1-v2 v3=v3-v2 return v1.angle(v3) def calc_dihedral(v1, v2, v3, v4): """ Calculate the dihedral angle between 4 vectors representing 4 connected points. The angle is in ]-pi, pi]. @param v1, v2, v3, v4: the four points that define the dihedral angle @type v1, v2, v3, v4: L{Vector} """ ab=v1-v2 cb=v3-v2 db=v4-v3 u=ab**cb v=db**cb w=u**v angle=u.angle(v) # Determine sign of angle try: if cb.angle(w)>0.001: angle=-angle except ZeroDivisionError: # dihedral=pi pass return angle def rotaxis(theta, vector): """ Calculate a left multiplying rotation matrix that rotates theta rad around vector. Example: >>> m=rotaxis(pi, Vector(1,0,0)) >>> rotated_vector=any_vector.left_multiply(m) @type theta: float @param theta: the rotation angle @type vector: L{Vector} @param vector: the rotation axis @return: The rotation matrix, a 3x3 Numeric array. """ vector=vector.copy() vector.normalize() c=cos(theta) s=sin(theta) t=1-c x,y,z=vector.get_array() rot=zeros((3,3), "d") # 1st row rot[0,0]=t*x*x+c rot[0,1]=t*x*y-s*z rot[0,2]=t*x*z+s*y # 2nd row rot[1,0]=t*x*y+s*z rot[1,1]=t*y*y+c rot[1,2]=t*y*z-s*x # 3rd row rot[2,0]=t*x*z-s*y rot[2,1]=t*y*z+s*x rot[2,2]=t*z*z+c return rot def refmat(p,q): """ Return a (left multiplying) matrix that mirrors p onto q. Example: >>> mirror=refmat(p,q) >>> qq=p.left_multiply(mirror) >>> print q, qq # q and qq should be the same @type p,q: L{Vector} @return: The mirror operation, a 3x3 Numeric array. """ p.normalize() q.normalize() if (p-q).norm()<1e-5: return eye(3) pq=p-q pq.normalize() b=pq.get_array() b.shape=(3, 1) i=eye(3) ref=i-2* dot(b, transpose(b)) return ref def rotmat(p,q): """ Return a (left multiplying) matrix that rotates p onto q. Example: >>> r=rotmat(p,q) >>> print q, p.left_multiply(r) @param p: moving vector @type p: L{Vector} @param q: fixed vector @type q: L{Vector} @return: rotation matrix that rotates p onto q @rtype: 3x3 Numeric array """ rot=refmat(q, -p) * refmat(p, -p).transpose() return rot class Vector(object): "3D vector" def __init__(self, x, y=None, z=None): if y is None and z is None: # Array, list, tuple... if len(x)!=3: raise "Vector: x is not a list/tuple/array of 3 numbers" self._ar=array(x) else: # Three numbers self._ar=array([x, y, z]) def __eq__(self, other): return allclose(self._ar, other._ar, 0.01) def __ne__(self, other): return not self.__eq__(other) def __repr__(self): x, y, z = self._ar return "" % (x, y, z) def __neg__(self): "Return Vector(-x, -y, -z)" return Vector(-self._ar) def __add__(self, other): "Return Vector+other Vector or scalar" if isinstance(other, Vector): a=self._ar+other._ar else: a=self._ar+array(other) return Vector(a) def __sub__(self, other): "Return Vector-other Vector or scalar" if isinstance(other, Vector): a=self._ar-other._ar else: a=self._ar-array(other) return Vector(a) def __mul__(self, other): "Return Vector.Vector (dot product)" return sum(self._ar*other._ar) def __div__(self, x): "Return Vector(coords/a)" a=self._ar/array(x) return Vector(a) def __pow__(self, other): "Return VectorxVector (cross product) or Vectorxscalar" if isinstance(other, Vector): a,b,c=self._ar d,e,f=other._ar c1=det(array(((b,c), (e,f)))) c2=-det(array(((a,c), (d,f)))) c3=det(array(((a,b), (d,e)))) return Vector(c1,c2,c3) else: a=self._ar*array(other) return Vector(a) def __getitem__(self, i): return self._ar[i] def __setitem__(self, i, value): self._ar[i]=value def norm(self): "Return vector norm" return sqrt(sum(self._ar*self._ar)) def normsq(self): "Return square of vector norm" return abs(sum(self._ar*self._ar)) def normalize(self): "Normalize the Vector" self._ar=self._ar/self.norm() def normalized(self): "Return a normalized copy of the Vector" v = self.copy() v.normalize() return v def angle(self, other): "Return angle between two vectors" n1=self.norm() n2=other.norm() c=(self*other)/(n1*n2) # Take care of roundoff errors c=min(c,1) c=max(-1,c) return arccos(c) def get_array(self): "Return (a copy of) the array of coordinates" return array(self._ar) def left_multiply(self, matrix): "Return Vector=Matrix x Vector" return Vector(dot(matrix, self._ar)) def right_multiply(self, matrix): "Return Vector=Vector x Matrix" return Vector(dot(self._ar, matrix)) def copy(self): "Return a deep copy of the Vector" return Vector(self._ar) #xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx test_Vector.py: import unittest from math import pi, degrees from numpy import array, allclose, transpose from Vector import Vector, calc_angle, calc_dihedral, refmat, rotmat, \ rotaxis, vector_to_axis class TestVectorFunctions(unittest.TestCase): """ Vector-class test functions """ def setUp(self): self.v1 = Vector(0,0,1) self.v2 = Vector(0,0,0) self.v3 = Vector(0,1,0) self.v4 = Vector(1,1,0) self.v5 = Vector(-1,-1,0) self.ref_array = array([[ 1.0, 0.0, 0.0], [ 0.0, 0.0, 1.0], [ 0.0, 1.0, 0.0]]) self.tolerance = 0.001 def test__eq__(self): """Vector.__eq__ should return Boolean for equality check on two Vectors: testing True""" self.assert_(self.v1 == self.v1) def test__eq__2(self): """Vector.__eq__ should return Boolean for equality check on two Vectors: testing False""" self.failIf(self.v1 == self.v2) def test__ne__(self): """Vector.__ne__ should return Boolean for non-equal Vectors: testing True""" self.assert_(self.v1 != self.v2) def test__ne__2(self): """Vector.__ne__ should return Boolean for non-equal Vectors: testing False""" self.failIf(self.v1 != self.v1) def test__repr__(self): """Vector.__repr__ should return a Vector-object as a nice string""" self.assertEqual(repr(self.v1), "") def test__neg__(self): """Vector.__neg__ should Vector-object * -1""" v = Vector(0,0,-1) self.assertEqual(-self.v1, v) def test__add__(self): """testing Vector.___add__: Vector + Vector""" v = Vector(1,1,1) v2 = Vector(1,1,2) self.assertEqual(self.v1+v, v2) def test__add__2(self): """testing Vector.___add__: Vector + scalar""" v = Vector(3,3,4) self.assertEqual(self.v1+3, v) def test__add__3(self): """testing Vector.___add__: Vector + scalars""" v = Vector(1,2,4) self.assertEqual(self.v1+(1,2,3), v) def test__sub__(self): """testing Vector.__sub__(): Vector - Vector""" self.assertEqual(self.v1-self.v1, self.v2) def test__sub__2(self): """testing Vector.__sub__(): Vector-scalar""" self.assertEqual(self.v1-1, self.v5) def test__sub__3(self): """testing Vector.__sub__(): Vector-scalars""" v = Vector(-1,-2,-2) self.assertEqual(self.v1-(1,2,3), v) def test__mul__(self): """testing Vector.__mul__()""" self.assertEqual(self.v1 * self.v2, 0) def test__pow__(self): """testing Vector.__pow__()""" self.assertEqual(self.v1** self.v2, self.v2) def test__getitem__(self): """testing Vector.__getitem__""" self.assertEqual(self.v1[0], 0) def test__setitem__(self): """testing Vector.__setitem__""" v = self.v3 v[0] = 1 self.assertEqual(v, self.v4) def testNorm(self): """testing Vector.norm()""" self.assertEqual(self.v4, self.v4) def testNormsq(self): """testing Vector.normsq()""" self.assertEqual(self.v4, self.v4) def testNomalize(self): """testing Vector.normalize()""" self.v4.normalize() v = Vector(0.71, 0.71, 0.00) self.assertEqual(self.v4, v) def testNomalized(self): """testing Vector.normalized()""" self.v4.normalize() v = Vector(0.71, 0.71, 0.00) self.assertEqual(self.v4, v) def testAngle(self): """testing Vector.angle()""" self.assertEqual(degrees(self.v2.angle(self.v1)), 180) def testGetarray(self): """testing Vector.get_array()""" self.assert_(all(self.v1.get_array() == array((0,0,1)))) def testCopy(self): """testing Vector.copy()""" self.assertEqual(self.v1.copy(), self.v1) def testCalcangle(self): """testing calc_angle()""" self.assertEqual(degrees(calc_angle(self.v1, self.v2, self.v3)), 90.0) def testRefmat(self): """testing refmat()""" self.assert_(allclose(refmat(self.v1, self.v3), self.ref_array, self.tolerance)) def testRotmat(self): """testing rotmat()""" self.assert_(allclose(refmat(self.v1, self.v3), self.ref_array, self.tolerance)) def testLeftmultiply(self): """testing Vector.leftmultiply()""" self.assertEqual(self.v1.left_multiply(self.ref_array), self.v3) def testRightmultiply(self): """testing Vector.rightmultiply()""" self.assertEqual(self.v1.right_multiply(transpose(self.ref_array)), self.v3) def testRotaxis(self): """testing rotaxis()""" a = array([[ -1.0, 0, 0.0], [0.0, -1.0, 0.0], [0.0, 0.0, 1.0]]) self.assert_(allclose(rotaxis(pi, self.v1), a, self.tolerance)) def testVector_to_axis(self): """testing vector_to_axis""" self.assertEqual(vector_to_axis(self.v5, self.v1), self.v1) def testCalc_dihedral(self): """testing calc_dihedral""" self.assertEqual(degrees(calc_dihedral(self.v1, self.v2, self.v3, self.v4)), 90) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Mon Sep 22 13:39:46 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Mon, 22 Sep 2008 18:39:46 +0100 Subject: [Biopython-dev] Modules to be removed from Biopython In-Reply-To: <320fb6e00809200431h2ace4e4dge0cc9835e8d8d53f@mail.gmail.com> References: <492634.64872.qm@web62414.mail.re1.yahoo.com> <320fb6e00806270950k479eda23ia96d3c2d36557510@mail.gmail.com> <320fb6e00807160540w325fe995mea400b0014fd7c2e@mail.gmail.com> <320fb6e00807220913g64613854j7a1deb5b4357f726@mail.gmail.com> <320fb6e00808080417y483f74c8xd94dd7ca9eea0476@mail.gmail.com> <320fb6e00809180515g59e53bddoa1d83242df198a1@mail.gmail.com> <320fb6e00809200431h2ace4e4dge0cc9835e8d8d53f@mail.gmail.com> Message-ID: <320fb6e00809221039j2a1a67fcsda2ffca266f4eea8@mail.gmail.com> As part of the general Martel/Mindy clean up, I've added a deprecation warning to Mindy and several other closely related modules, but have only made a docstring change to Martel. I'm not sure if we should add a deprecation warning to Martel directly - it would be triggered by running the Biopython setup.py file which is nasty. Perhaps for this special case, documentation is enough? Summary: * Martel - labelled as deprecated for 1.49, but no explicit warning (see above) * Bio.Mindy - deprecated for 1.49 * Bio.Std - deprecated for 1.49 * Bio.StdHandler - deprecated for 1.49 * Bio.builders - deprecated for 1.49 * Bio.Decode - deprecated for 1.49 * Bio.Writer (and Bio.writers.*) deprecated in 1.48 * Bio.expressions - deprecated in 1.44, removed for 1.49 * Bio.config - effectively deprecated in 1.44, explicitly in 1.48, removed for 1.49 * Bio.dbdefs - effectively deprecated in 1.44, explicitly in 1.48, removed for 1.49 * Bio.formatdefs - effectively deprecated in 1.44, explicitly in 1.48, removed for 1.49 Open questions: * Bio.DBXRef - does anyone known what this is for? * Bio.SGMLExtractor - deprecated in 1.46, ready for removal? As a bonus once we've moved from CVS to SVN, we should be able to remove some of the now empty directories in CVS :) Peter From mjldehoon at yahoo.com Mon Sep 22 20:55:01 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Mon, 22 Sep 2008 17:55:01 -0700 (PDT) Subject: [Biopython-dev] Modules to be removed from Biopython In-Reply-To: <320fb6e00809221039j2a1a67fcsda2ffca266f4eea8@mail.gmail.com> Message-ID: <553736.99426.qm@web62402.mail.re1.yahoo.com> The code in setup.py that causes the DeprecationWarning to appear can be fixed relatively easily. Basically, this code was relevant when Martel was also being distributed as a separate module. Nowadays, Biopython contains the latest version of Martel, so there's no reason to check it. The other function of the code in setup.py was to check if importing Martel raises any import errors, in particular for mxTextTools. That we can check by trying to import the dependencies directly, i.e. without going through Martel. After fixing the code in setup.py, we can add a DeprecationWarning to Bio.Martel. --Michiel. --- On Mon, 9/22/08, Peter wrote: > From: Peter > Subject: Re: [Biopython-dev] Modules to be removed from Biopython > To: biopython-dev at biopython.org > Date: Monday, September 22, 2008, 1:39 PM > As part of the general Martel/Mindy clean up, I've added > a deprecation > warning to Mindy and several other closely related modules, > but have > only made a docstring change to Martel. I'm not sure > if we should add > a deprecation warning to Martel directly - it would be > triggered by > running the Biopython setup.py file which is nasty. > Perhaps for this > special case, documentation is enough? > > Summary: > > * Martel - labelled as deprecated for 1.49, but no explicit > warning (see above) > * Bio.Mindy - deprecated for 1.49 > * Bio.Std - deprecated for 1.49 > * Bio.StdHandler - deprecated for 1.49 > * Bio.builders - deprecated for 1.49 > * Bio.Decode - deprecated for 1.49 > * Bio.Writer (and Bio.writers.*) deprecated in 1.48 > * Bio.expressions - deprecated in 1.44, removed for 1.49 > * Bio.config - effectively deprecated in 1.44, explicitly > in 1.48, > removed for 1.49 > * Bio.dbdefs - effectively deprecated in 1.44, explicitly > in 1.48, > removed for 1.49 > * Bio.formatdefs - effectively deprecated in 1.44, > explicitly in 1.48, > removed for 1.49 > > Open questions: > * Bio.DBXRef - does anyone known what this is for? > * Bio.SGMLExtractor - deprecated in 1.46, ready for > removal? > > As a bonus once we've moved from CVS to SVN, we should > be able to > remove some of the now empty directories in CVS :) > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From biopython at maubp.freeserve.co.uk Tue Sep 23 05:02:17 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 23 Sep 2008 10:02:17 +0100 Subject: [Biopython-dev] Modules to be removed from Biopython In-Reply-To: <553736.99426.qm@web62402.mail.re1.yahoo.com> References: <320fb6e00809221039j2a1a67fcsda2ffca266f4eea8@mail.gmail.com> <553736.99426.qm@web62402.mail.re1.yahoo.com> Message-ID: <320fb6e00809230202k507e8ac3m16dc889b245e1c51@mail.gmail.com> On Tue, Sep 23, 2008 at 1:55 AM, Michiel de Hoon wrote: > The code in setup.py that causes the DeprecationWarning to > appear can be fixed relatively easily. Basically, this code was > relevant when Martel was also being distributed as a separate > module. Nowadays, Biopython contains the latest version of > Martel, so there's no reason to check it. The other function of the > code in setup.py was to check if importing Martel raises any > import errors, in particular for mxTextTools. That we can check > by trying to import the dependencies directly, i.e. without going > through Martel. After fixing the code in setup.py, we can add a > DeprecationWarning to Bio.Martel. That sounds positive. I was thinking we might want to edit setup.py so that it doesn't complain loudly if mxTextTools is missing - given this will now only be needed for deprecated modules. Do you have any view on this? Peter From biopython at maubp.freeserve.co.uk Tue Sep 23 05:12:32 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 23 Sep 2008 10:12:32 +0100 Subject: [Biopython-dev] Numeric/numpy In-Reply-To: <415448.81411.qm@web62403.mail.re1.yahoo.com> References: <320fb6e00809190842i6583f7bard82b03d5ea36f51e@mail.gmail.com> <415448.81411.qm@web62403.mail.re1.yahoo.com> Message-ID: <320fb6e00809230212tb5a763cp8fdd58ef90fcc6ba@mail.gmail.com> I was just thinking about the situation where people have both Numeric and numpy installed, and that rather than using: try: from Numeric import x, y, z except ImportError: from numpy.oldnumeric import x, y, z arguably we should be giving numpy priority. One solution would be something like this: try: from numpy.oldnumeric import x, y, z except ImportError, e: try : from Numeric import x, y, z except ImportError : raise e #Want to complain about numpy, not Numeric Unfortunately this is rather long! Alternatively, shall we wait until the end of the week (say), and if no-one objects to a straight switch from Numeric to numpy, proceed with just the following?: from numpy.oldnumeric import x, y, z Peter From chapmanb at 50mail.com Tue Sep 23 08:08:09 2008 From: chapmanb at 50mail.com (Brad Chapman) Date: Tue, 23 Sep 2008 08:08:09 -0400 Subject: [Biopython-dev] Modules to be removed from Biopython In-Reply-To: <320fb6e00809221039j2a1a67fcsda2ffca266f4eea8@mail.gmail.com> References: <492634.64872.qm@web62414.mail.re1.yahoo.com> <320fb6e00806270950k479eda23ia96d3c2d36557510@mail.gmail.com> <320fb6e00807160540w325fe995mea400b0014fd7c2e@mail.gmail.com> <320fb6e00807220913g64613854j7a1deb5b4357f726@mail.gmail.com> <320fb6e00808080417y483f74c8xd94dd7ca9eea0476@mail.gmail.com> <320fb6e00809180515g59e53bddoa1d83242df198a1@mail.gmail.com> <320fb6e00809200431h2ace4e4dge0cc9835e8d8d53f@mail.gmail.com> <320fb6e00809221039j2a1a67fcsda2ffca266f4eea8@mail.gmail.com> Message-ID: <20080923120809.GG13074@localdomain> Hi Peter; Thanks for your work cleaning this up. > Open questions: > * Bio.DBXRef - does anyone known what this is for? > * Bio.SGMLExtractor - deprecated in 1.46, ready for removal? DBXref is associated with all the Martel parsing, so it can be removed/deprecated as well. It was used in building SeqRecords from Martel descriptions (Bio.builders.SeqRecord.sequence). Brad -- Brad Chapman Codon Devices http://www.codondevices.com From lpritc at scri.ac.uk Tue Sep 23 08:52:38 2008 From: lpritc at scri.ac.uk (Leighton Pritchard) Date: Tue, 23 Sep 2008 13:52:38 +0100 Subject: [Biopython-dev] Modules to be removed from Biopython In-Reply-To: <20080923120809.GG13074@localdomain> Message-ID: Hi all, It looks like Bio.DBXRef provides a dictionary of dictionaries that associate database identifiers from a number of file formats with the appropriate databases. This sort of thing might be useful to keep around (i.e. not to have to rebuild from scratch) if there is an intention to populate the dbxref table with consistent Dbnames for divergent identifiers. However, Peter appears to have noted in the code for Loader.py that this behaviour would be inconsistent with the other Bio* projects, and mentions bug 2405 in that context. L. On 23/09/2008 13:08, "Brad Chapman" wrote: > Hi Peter; > Thanks for your work cleaning this up. > >> Open questions: >> * Bio.DBXRef - does anyone known what this is for? >> * Bio.SGMLExtractor - deprecated in 1.46, ready for removal? > > DBXref is associated with all the Martel parsing, so it can be > removed/deprecated as well. It was used in building SeqRecords from > Martel descriptions (Bio.builders.SeqRecord.sequence). > > Brad -- Dr Leighton Pritchard MRSC D131, Plant Pathology Programme, SCRI Errol Road, Invergowrie, Perth and Kinross, Scotland, DD2 5DA e:lpritc at scri.ac.uk w:http://www.scri.ac.uk/staff/leightonpritchard gpg/pgp: 0xFEFC205C tel:+44(0)1382 562731 x2405 ______________________________________________________________________ SCRI, Invergowrie, Dundee, DD2 5DA. The Scottish Crop Research Institute is a charitable company limited by guarantee. Registered in Scotland No: SC 29367. Recognised by the Inland Revenue as a Scottish Charity No: SC 006662. DISCLAIMER: This email is from the Scottish Crop Research Institute, but the views expressed by the sender are not necessarily the views of SCRI and its subsidiaries. This email and any files transmitted with it are confidential to the intended recipient at the e-mail address to which it has been addressed. It may not be disclosed or used by any other than that addressee. If you are not the intended recipient you are requested to preserve this confidentiality and you must not use, disclose, copy, print or rely on this e-mail in any way. Please notify postmaster at scri.ac.uk quoting the name of the sender and delete the email from your system. Although SCRI has taken reasonable precautions to ensure no viruses are present in this email, neither the Institute nor the sender accepts any responsibility for any viruses, and it is your responsibility to scan the email and the attachments (if any). ______________________________________________________________________ From jblanca at btc.upv.es Tue Sep 23 08:40:24 2008 From: jblanca at btc.upv.es (Jose Blanca) Date: Tue, 23 Sep 2008 14:40:24 +0200 Subject: [Biopython-dev] about the SeqRecord and SeqFeature classes Message-ID: <200809231440.24684.jblanca@btc.upv.es> Hi: I'm still interested on the design of the Sequence and Alignment classes. For my work I need sequence classes with some extended features. I need a SequenceWithQuality class and a Seq class capable of holding information about features located in different regions of the sequence. I could use SeqRecord for the sequence with features and extend Seq for the SequenceWithQuality, but I have found some problems with this approach. SeqRecord still doesn't have a __getitem__ method. Also, SeqRecord exposes the implementation of the features collection, it's a public list. That I think is a limitation. For instance, we could be interested in controlling if a the feature added is inside the region covered by the sequence. We can't also ask for features by their name or type. I understand that keeping compatibility is paramount for BioPython and I share that concern. I also understand that having two classes to do the same job is not a nice thing. Nevertheless I have been thinking about these issues and I have implemented a non-mutable sequence class with these ideas in mind. I plan to use this implementation to write an Alignment class capable of dealing with ESTs assemblies. The most different aspect of this proposal and the code actually alive in BioPython are the LocatableFeature and Location classes. LocatableFeature is equivalent to SeqFeature, but while SeqFeature is mostly a struct with no methods LocatableFeature has a __getitem__, __len__ and complement. Location is inspired by the BioRange BioPerl class. I would like to have equivalent functions in BioPython and I'm willing to help in the adaptation the actual BioPython classes. I would appreciate to hear your suggestions and criticisms about the classes that I'm sending. Best regards, Jose Blanca P.D. In the tests files there is detailed information about how these classes would work. -- Jose M. Blanca Postigo Instituto Universitario de Conservacion y Mejora de la Agrodiversidad Valenciana (COMAV) Universidad Politecnica de Valencia (UPV) Edificio CPI (Ciudad Politecnica de la Innovacion), 8E 46022 Valencia (SPAIN) Tlf.:+34-96-3877000 (ext 88473) -------------- next part -------------- A non-text attachment was scrubbed... Name: biolib.tar.gz Type: application/x-tgz Size: 9873 bytes Desc: not available URL: From mjldehoon at yahoo.com Tue Sep 23 09:58:14 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Tue, 23 Sep 2008 06:58:14 -0700 (PDT) Subject: [Biopython-dev] Modules to be removed from Biopython In-Reply-To: <320fb6e00809230202k507e8ac3m16dc889b245e1c51@mail.gmail.com> Message-ID: <914965.75347.qm@web62403.mail.re1.yahoo.com> > That sounds positive. I was thinking we might want to edit > setup.py > so that it doesn't complain loudly if mxTextTools is > missing - given > this will now only be needed for deprecated modules. Do > you have any > view on this? Since mxTextTools is only needed at run time and not at compile time, I think we do not have to check at all if it is present or not. Then in Martel, if importing mxTextTools fails, we can give an informative error message saying that the user should install mxTextTools. Since Martel is deprecated anyway, I think that that is quite sufficient. Compare it to ReportLab: Bio.Graphics imports it, but we don't check in setup.py if it is present or not. --Michiel From biopython at maubp.freeserve.co.uk Tue Sep 23 10:37:29 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 23 Sep 2008 15:37:29 +0100 Subject: [Biopython-dev] about the SeqRecord and SeqFeature classes In-Reply-To: <200809231440.24684.jblanca@btc.upv.es> References: <200809231440.24684.jblanca@btc.upv.es> Message-ID: <320fb6e00809230737h223e6e3dgac6bf0fbbf4af41@mail.gmail.com> On Tue, Sep 23, 2008 at 1:40 PM, Jose Blanca wrote: > Hi: > I'm still interested on the design of the Sequence and Alignment classes. For > my work I need sequence classes with some extended features. I need a > SequenceWithQuality class and a Seq class capable of holding information > about features located in different regions of the sequence. > I could use SeqRecord for the sequence with features and extend Seq for the > SequenceWithQuality, but I have found some problems with this approach. I would also like to be able to have SeqRecord or Seq objects with a quality sequence. This is probably more important than a general "per letter annotation" system for sequences. Would you want to use integers, floats or characters for the quality scores? > SeqRecord still doesn't have a __getitem__ method. What do you think of the __getitem__ method proposed in attachment 942 on Bug 2507? http://bugzilla.open-bio.org/show_bug.cgi?id=2507 > Also, SeqRecord exposes the implementation of the features collection, > it's a public list. That I think is a limitation. For instance, we could be interested > in controlling if a the feature added is inside the region covered by the sequence. Yes, because it is currently a public list we can't easily stop the user putting in-appropriate features (or other objects) into the list. A list-like sub-class with some brains behind it might be one backwards compatible approach. But do we really need to worry about this? > We can't also ask for features by their name or type. You can work around this by creating a lookup dictionary, e.g. http://www.warwick.ac.uk/go/peter_cock/python/genbank/#indexing_features Perhaps we could add a "lookup feature" function given say an annotation key (e.g. "locus_tag") and value (e.g. "NEQ010") plus perhaps feature type (e.g. "CDS"). > I understand that keeping compatibility is paramount for BioPython and I share > that concern. I also understand that having two classes to do the same job is > not a nice thing. I agree. Especially now that Bio.SeqIO and AlignIO seem to be working out pretty well and these are pretty tied into the SeqRecord object. > Nevertheless I have been thinking about these issues and I have > implemented a non-mutable sequence class with these ideas in mind. I > plan to use this implementation to write an Alignment class capable of > dealing with ESTs assemblies. Dealing nicely with EST assemblies is a valuable goal. > The most different aspect of this proposal and the code actually alive in > BioPython are the LocatableFeature and Location classes. LocatableFeature is > equivalent to SeqFeature, but while SeqFeature is mostly a struct with no > methods LocatableFeature has a __getitem__, __len__ and complement. > Location is inspired by the BioRange BioPerl class. I personally don't like the current way Biopython stores the location for SeqFeatures containing sub-features (e.g. anything with a join). The join-location can only be determined from a combination of the location of each sub-feature. However, this standard is currently implemented and stable, and supported in Biopython's BioSQL wrapper. > I would like to have equivalent functions in BioPython and I'm willing to help > in the adaptation the actual BioPython classes. I would appreciate to hear > your suggestions and criticisms about the classes that I'm sending. > Best regards, If there are enough people interested in re-working the Seq/MutableSeq/SeqRecord objects with an API break, we could seriously discuss this as part of a hypothetical "Biopython 2.0". Once we move from CVS to SVN it would also be possible to setup a branch in the repository to experiment there. However, I think there is still plenty of potential for improving things in a backwards compatible manor (and have opened several enhancement bugs on bugzilla for this). I would like to try and tackle these before breaking the existing API. Peter From biopython at maubp.freeserve.co.uk Tue Sep 23 10:52:14 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 23 Sep 2008 15:52:14 +0100 Subject: [Biopython-dev] Modules to be removed from Biopython In-Reply-To: <914965.75347.qm@web62403.mail.re1.yahoo.com> References: <320fb6e00809230202k507e8ac3m16dc889b245e1c51@mail.gmail.com> <914965.75347.qm@web62403.mail.re1.yahoo.com> Message-ID: <320fb6e00809230752q579133b1v6f89260b00811e8f@mail.gmail.com> On Tue, Sep 23, 2008 at 2:58 PM, Michiel de Hoon wrote: >> That sounds positive. I was thinking we might want to edit >> setup.py so that it doesn't complain loudly if mxTextTools is >> missing - given this will now only be needed for deprecated >> modules. Do you have any view on this? > > Since mxTextTools is only needed at run time and not at compile > time, I think we do not have to check at all if it is present or not. Agreed - done in CVS. > Then in Martel, if importing mxTextTools fails, we can give an > informative error message saying that the user should install > mxTextTools. That might be worth exploring - many people will probably be able to deduce this from an ImportError, but an informative error is more helpful. > Since Martel is deprecated anyway, I think that that is quite > sufficient. Compare it to ReportLab: Bio.Graphics imports it, > but we don't check in setup.py if it is present or not. OK. Peter From bugzilla-daemon at portal.open-bio.org Tue Sep 23 11:00:21 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 23 Sep 2008 11:00:21 -0400 Subject: [Biopython-dev] [Bug 2507] Adding __getitem__ to SeqRecord for element access and slicing In-Reply-To: Message-ID: <200809231500.m8NF0LeP017539@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2507 ------- Comment #10 from biopython-bugzilla at maubp.freeserve.co.uk 2008-09-23 11:00 EST ------- (In reply to comment #2) > Does this means that SeqRecord would deprecate the .seq attribute? If the .seq > attribute is not removed slicing could be used in it like: my_seq[1:100] and > my_seq.seq[1:100]. > If you had a SeqRecord, record, then yes with this patch you could do: record[1:100] - gives another SeqRecord with annotation record.seq[1:100] - gives a Seq object with no annotation record[1:100].seq - should give an equivalent Seq object with no annotation Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Tue Sep 23 11:13:28 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 23 Sep 2008 16:13:28 +0100 Subject: [Biopython-dev] Modules to be removed from Biopython In-Reply-To: <553736.99426.qm@web62402.mail.re1.yahoo.com> References: <320fb6e00809221039j2a1a67fcsda2ffca266f4eea8@mail.gmail.com> <553736.99426.qm@web62402.mail.re1.yahoo.com> Message-ID: <320fb6e00809230813n49c967d8t5f24cad8c9a4009f@mail.gmail.com> On Tue, Sep 23, 2008 at 1:55 AM, Michiel de Hoon wrote: > The code in setup.py that causes the DeprecationWarning to appear > can be fixed relatively easily. Basically, this code was relevant when > Martel was also being distributed as a separate module. Nowadays, > Biopython contains the latest version of Martel, so there's no reason > to check it. Probably a safe assumption. > The other function of the code in setup.py was to check if importing > Martel raises any import errors, in particular for mxTextTools. Then > we can check by trying to import the dependencies directly, i.e. > without going through Martel. As discussed earlier, we've agreed not worry at install time whether or not mxTextTools is present. So basically, you recommend we just remove all the Martel special case code in setup.py, and simply install it automatically like any other module? I've made this change locally and it seems to be fine. If this is what you had in mind, I can commit this to CVS too. > After fixing the code in setup.py, we can add a DeprecationWarning to Bio.Martel. Agreed. Peter From biopython at maubp.freeserve.co.uk Tue Sep 23 11:19:19 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 23 Sep 2008 16:19:19 +0100 Subject: [Biopython-dev] Online access, Bio.PubMed & Bio.GenBank vs Bio.Entrez In-Reply-To: <320fb6e00808150928w1feb55d0j25e42c17d7230091@mail.gmail.com> References: <320fb6e00808150928w1feb55d0j25e42c17d7230091@mail.gmail.com> Message-ID: <320fb6e00809230819h44b34241t5e8bd15cf5f5043c@mail.gmail.com> In August 2008 Peter wrote: > This is a slightly long email covering what to do with the online code > in Bio.PubMed and Bio.GenBank, and how to make Bio.Entrez easier to > use. All these modules are essentially wrapping access to the NCBI > Entrez database via the Entrez Programming Utilities (EUtils). > http://www.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html One problem I raised in August with the online parts of Bio.PubMed and Bio.GenBank was they didn't provide a way to supply the user's email address. An email address can now be specified via Bio.Entrez BEFORE calling the Bio.PubMed or Bio.GenBank online functions, however this is non-obvious. However, we still have the inherent problem that these simple functions do not allow use of the NCBI's history feature. (Of course, for some situations this is never going to apply and therefore isn't a problem). > In addition to encouraging the use of Bio.Entrez by documenting it > prominently in the tutorial, we could go further and deprecate the > "user friendly" Bio.PubMed and Bio.GenBank wrapper functions. > What do people think of this? Deprecating the Dictionary classes in > particular could be a good idea as they use the old fashioned parser > objects. In the release notes for Biopython 1.48, I wrote: >> Bio.PubMed and the online code in Bio.GenBank are now considered >> obsolete, and we intend to deprecate them after the next release. >> For accessing PubMed and GenBank, please use Bio.Entrez instead. Are we agreed on deprecating (some or all of) these bits for Biopython 1.49? I'm happy to put the question to the main mailing list first. Peter From biopython at maubp.freeserve.co.uk Tue Sep 23 12:04:54 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Tue, 23 Sep 2008 17:04:54 +0100 Subject: [Biopython-dev] test_ParserSupport.py Message-ID: <320fb6e00809230904i48ed262fl183822cdc23ac212@mail.gmail.com> The module Bio/ParserSupport.py provides general code for scanner/consumer parsers (some of which historically have been written using Martel). It is used in several pure python parsers (e.g. Bio.SwissProt). However, Tests/test_ParserSupport.py (its unit test) did use Martel explicitly for the EventGenerator class, and would fail if mxTextTools is not installed. I have removed this part of the test in CVS. Looking over the codebase with grep, EventGenerator is used in: * Bio.ECell (deprecated in 1.46) * Bio.Emboss.Primer [STILL CURRENT] * Bio.IntelliGenetics (deprecated in 1.48) * Bio.MetaTool (deprecated in 1.48) * Bio.NBRF (deprecated in 1.48) It looks like Bio.Emboss.Primer is the only current bit of code using Bio.ParserSupport.EventGenerator, so it would be nice to still have this covered by the unit test. Does anyone fancy re-writing the EventGenerator part of unit test? I think this could be done by creating a simple python Scanner object... Peter From mjldehoon at yahoo.com Tue Sep 23 19:26:39 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Tue, 23 Sep 2008 16:26:39 -0700 (PDT) Subject: [Biopython-dev] Modules to be removed from Biopython In-Reply-To: <320fb6e00809230813n49c967d8t5f24cad8c9a4009f@mail.gmail.com> Message-ID: <202271.5433.qm@web62402.mail.re1.yahoo.com> > So basically, you recommend we just remove all the Martel > special case > code in setup.py, and simply install it automatically like > any other > module? I've made this change locally and it seems to > be fine. If > this is what you had in mind, I can commit this to CVS too. > Yes, I think that that is a good solution. --Michiel. From mjldehoon at yahoo.com Tue Sep 23 19:37:01 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Tue, 23 Sep 2008 16:37:01 -0700 (PDT) Subject: [Biopython-dev] test_ParserSupport.py In-Reply-To: <320fb6e00809230904i48ed262fl183822cdc23ac212@mail.gmail.com> Message-ID: <463161.32038.qm@web62401.mail.re1.yahoo.com> Bio.Emboss.Primer was also deprecated in 1.48 (replaced by Bio.Emboss.Primer3 and Bio.Emboss.PrimerSearch). --Michiel. --- On Tue, 9/23/08, Peter wrote: > From: Peter > Subject: [Biopython-dev] test_ParserSupport.py > To: "BioPython-Dev Mailing List" > Date: Tuesday, September 23, 2008, 12:04 PM > The module Bio/ParserSupport.py provides general code for > scanner/consumer parsers (some of which historically have > been written > using Martel). It is used in several pure python parsers > (e.g. > Bio.SwissProt). However, Tests/test_ParserSupport.py (its > unit test) > did use Martel explicitly for the EventGenerator class, and > would fail > if mxTextTools is not installed. I have removed this part > of the test > in CVS. > > Looking over the codebase with grep, EventGenerator is used > in: > * Bio.ECell (deprecated in 1.46) > * Bio.Emboss.Primer [STILL CURRENT] > * Bio.IntelliGenetics (deprecated in 1.48) > * Bio.MetaTool (deprecated in 1.48) > * Bio.NBRF (deprecated in 1.48) > > It looks like Bio.Emboss.Primer is the only current bit of > code using > Bio.ParserSupport.EventGenerator, so it would be nice to > still have > this covered by the unit test. Does anyone fancy > re-writing the > EventGenerator part of unit test? I think this could be > done by > creating a simple python Scanner object... > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From biopython at maubp.freeserve.co.uk Wed Sep 24 04:41:41 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 24 Sep 2008 09:41:41 +0100 Subject: [Biopython-dev] Modules to be removed from Biopython In-Reply-To: <202271.5433.qm@web62402.mail.re1.yahoo.com> References: <320fb6e00809230813n49c967d8t5f24cad8c9a4009f@mail.gmail.com> <202271.5433.qm@web62402.mail.re1.yahoo.com> Message-ID: <320fb6e00809240141k2f3f2507xe58094cd0ddc79d8@mail.gmail.com> On Wed, Sep 24, 2008 at 12:26 AM, Michiel de Hoon wrote: >> So basically, you recommend we just remove all the Martel >> special case code in setup.py, and simply install it >> automatically like any other module? I've made this >> change locally and it seems to be fine. If this is what >> you had in mind, I can commit this to CVS too. > > Yes, I think that that is a good solution. OK, change made in CVS. Peter From biopython at maubp.freeserve.co.uk Wed Sep 24 04:53:05 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 24 Sep 2008 09:53:05 +0100 Subject: [Biopython-dev] test_ParserSupport.py In-Reply-To: <463161.32038.qm@web62401.mail.re1.yahoo.com> References: <320fb6e00809230904i48ed262fl183822cdc23ac212@mail.gmail.com> <463161.32038.qm@web62401.mail.re1.yahoo.com> Message-ID: <320fb6e00809240153o6fd697acw9a6a7b6b952e0c3d@mail.gmail.com> On Wed, Sep 24, 2008 at 12:37 AM, Michiel de Hoon wrote: > > Bio.Emboss.Primer was also deprecated in 1.48 (replaced by Bio.Emboss.Primer3 and Bio.Emboss.PrimerSearch). Thanks - I clearly didn't check carefully enough, I just read the CVS comments. I've made a slight revision to Bio.Emboss.Primer in CVS to expand the module docstring (and say it is deprecated) plus moved the deprecation warning above the Martel import (which would fail if mxTextTools wasn't installed - meaning the deprecation warning wasn't shown). I *think* this means Bio.ParserSupport.EventGenerator is now only being used in deprecated modules, so the lack of a unit test covering this is less important. Peter From bugzilla-daemon at portal.open-bio.org Wed Sep 24 06:44:15 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 24 Sep 2008 06:44:15 -0400 Subject: [Biopython-dev] [Bug 2489] KDTree NN search without specifying radius In-Reply-To: Message-ID: <200809241044.m8OAiF3w009359@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2489 ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2008-09-24 06:44 EST ------- These C++ suggestions are now obsolete, as the C++ part of Bio.KDTree has been re-written in plain C (in CVS after the release of Biopython 1.48). This was to simplify the build process as the C++ code had problems on some platforms. Making the radius optional in KDTree searches is still a potentially useful enhancement... -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Wed Sep 24 12:58:21 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 24 Sep 2008 17:58:21 +0100 Subject: [Biopython-dev] Versions of numpy/Numeric Message-ID: <320fb6e00809240958x37aa1e97ka2a569e311e2756b@mail.gmail.com> Hi again, I was wondering what versions of numpy and Numeric have been tested with Biopython CVS? For anyone who didn't know, you can check at the python prompt with: import numpy print numpy.version.version and, import Numeric print Numeric.__version__ Using CVS Biopython compiled from source, the unit tests all seem fine on the following three setups: Mac OS X, python 2.5.2, Numeric 24.2 and numpy 1.1.1 Test suite looks fine Linux, python 2.5, Numeric 24.2 and numpy 1.0 Fine, ignoring the Numeric eigenvalue problem in test_SVDSuperimposer.py previously discussed Linux, python 2.3, numpy 1.1.1 [no Numeric] Fine, after fixing some broken imports which were using recent python syntax, and reducing the number of decimal places used in test_SVDSuperimposer.py (numpy and Numeric give very slightly different answers). Note that testing where there is NO version of Numeric is important (as in this third example), as if both numpy and Numeric are installed currently most of the pure python modules will use Numeric by choice. Also note that running the test suite via run_tests.py will hide any deprecation warnings from numpy - I tried running test_SVDSuperimposer.py on its own and got: /home/xxx/lib/python2.3/site-packages/numpy/lib/utils.py:114: DeprecationWarning: ('matrixmultiply is deprecated, use dot',) I've now updated Bio/SVDSuperimposer/SVDSuperimposer.py to use dot instead of matrixmultiply (this works on both numpy and Numeric). Peter From bsouthey at gmail.com Wed Sep 24 14:19:42 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Wed, 24 Sep 2008 13:19:42 -0500 Subject: [Biopython-dev] Versions of numpy/Numeric In-Reply-To: <320fb6e00809240958x37aa1e97ka2a569e311e2756b@mail.gmail.com> References: <320fb6e00809240958x37aa1e97ka2a569e311e2756b@mail.gmail.com> Message-ID: <48DA84BE.7080105@gmail.com> Peter wrote: > Hi again, > > I was wondering what versions of numpy and Numeric have been tested > with Biopython CVS? For anyone who didn't know, you can check at the > python prompt with: > > import numpy > print numpy.version.version > Actually just do numpy.__version__ Currently numpy 1.2 is at the second release candidate stage. Note that this version requires Python 2.4 and uses the nose testing framework version 0.10 or later for testing. Somewhat related to this, what is the appropriate way to find the version of BioPython installed within Python? Bruce From bsouthey at gmail.com Wed Sep 24 15:22:35 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Wed, 24 Sep 2008 14:22:35 -0500 Subject: [Biopython-dev] Versions of numpy/Numeric In-Reply-To: <320fb6e00809240958x37aa1e97ka2a569e311e2756b@mail.gmail.com> References: <320fb6e00809240958x37aa1e97ka2a569e311e2756b@mail.gmail.com> Message-ID: <48DA937B.8000901@gmail.com> Peter wrote: > Hi again, > > I was wondering what versions of numpy and Numeric have been tested > with Biopython CVS? For anyone who didn't know, you can check at the > python prompt with: > > import numpy > print numpy.version.version > > and, > > import Numeric > print Numeric.__version__ > > Using CVS Biopython compiled from source, the unit tests all seem fine > on the following three setups: > > Mac OS X, python 2.5.2, Numeric 24.2 and numpy 1.1.1 > Test suite looks fine > > Linux, python 2.5, Numeric 24.2 and numpy 1.0 > Fine, ignoring the Numeric eigenvalue problem in > test_SVDSuperimposer.py previously discussed > > Linux, python 2.3, numpy 1.1.1 [no Numeric] > Fine, after fixing some broken imports which were using recent python > syntax, and reducing the number of decimal places used in > test_SVDSuperimposer.py (numpy and Numeric give very slightly > different answers). > > Note that testing where there is NO version of Numeric is important > (as in this third example), as if both numpy and Numeric are installed > currently most of the pure python modules will use Numeric by choice. > > Also note that running the test suite via run_tests.py will hide any > deprecation warnings from numpy - I tried running > test_SVDSuperimposer.py on its own and got: > /home/xxx/lib/python2.3/site-packages/numpy/lib/utils.py:114: > DeprecationWarning: ('matrixmultiply is deprecated, use dot',) > I've now updated Bio/SVDSuperimposer/SVDSuperimposer.py to use dot > instead of matrixmultiply (this works on both numpy and Numeric). > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > > Hi, With Numeric 24.2 and Numpy 1.2rc2 installed on linux 64bit and Python 2.5.1. python python setup.py test gives several deprecation warnings from test_Cluster and test_KDTree but still pass. test_MarkovModel fails as I found with BioPython 1.48 (Bug 2589). This is most likely a 64-bit thing with Python 2.5. ERROR: test_MarkovModel ---------------------------------------------------------------------- Traceback (most recent call last): File "run_tests.py", line 152, in runTest self.runSafeTest() File "run_tests.py", line 165, in runSafeTest cur_test = __import__(self.test_name) File "test_MarkovModel.py", line 65, in p_emission=p_emission File "/home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.5/Bio/MarkovModel.py", line 220, in _baum_welch lpseudo_initial, lpseudo_transition, lpseudo_emission,) File "/home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.5/Bio/MarkovModel.py", line 276, in _baum_welch_one lp_initial[:] = lp_arcout_t[:,0] I did notice that both test_MarkovModel.py and test_SVDSuperimposer.py have first try to import Numeric - as does MarkovModel.py. However this same bug is still likely since numpy.oldnumeric is used. Regards Bruce From biopython at maubp.freeserve.co.uk Wed Sep 24 16:37:43 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 24 Sep 2008 21:37:43 +0100 Subject: [Biopython-dev] Versions of numpy/Numeric In-Reply-To: <48DA937B.8000901@gmail.com> References: <320fb6e00809240958x37aa1e97ka2a569e311e2756b@mail.gmail.com> <48DA937B.8000901@gmail.com> Message-ID: <320fb6e00809241337r2e2edbdeh27b5a793832f762b@mail.gmail.com> > Hi, > With Numeric 24.2 and Numpy 1.2rc2 installed on linux 64bit and Python > 2.5.1. I don't currently have access to a machine with that setup - so this is very useful. Thanks! > python python setup.py test gives several deprecation warnings from > test_Cluster and test_KDTree but still pass. Iteresting - these may be deprecations for Numpy 1.2 - if you have the output handy could you share it? If its very long, you can just send it to me off the list (or file a bug with the details). > test_MarkovModel fails as I found with BioPython 1.48 (Bug 2589). This is > most likely a 64-bit thing with Python 2.5. > > ERROR: test_MarkovModel > ---------------------------------------------------------------------- > Traceback (most recent call last): > ... > line 276, in _baum_welch_one > lp_initial[:] = lp_arcout_t[:,0] > > I did notice that both test_MarkovModel.py and test_SVDSuperimposer.py have > first try to import Numeric - as does MarkovModel.py. Yeah - I'd raised that on the dev list earlier. Depending on how we do this, we'll could end up with misleading import failures (an error about Numeric when we really care about numpy). If however no-one objects to completely dropping Numeric for the next release, things become much simpler. > However this same bug is still likely since numpy.oldnumeric is used. If you have both Numeric and numpy installed, this module is probably using Numeric and thus still fails. Could you try flipping the imports round in .../Bio/MarkovModel.py to see if this problem goes away (i.e. make sure it uses numpy instead of Numeric)? If the problem is still there, would you mind also trying the work around you suggested on Bug 2589 please (dropping the [:] from the left-hand side)? If that works for you on both numpy and Numeric it seems a worthwhile change for CVS. Thanks Peter From biopython at maubp.freeserve.co.uk Wed Sep 24 17:12:24 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 24 Sep 2008 22:12:24 +0100 Subject: [Biopython-dev] determining the version Message-ID: <320fb6e00809241412r54c2a3a1mc69f3e573f1eaac7@mail.gmail.com> >> I was wondering what versions of numpy and Numeric have been tested >> with Biopython CVS? For anyone who didn't know, you can check at the >> python prompt with: >> >> import numpy >> print numpy.version.version >> > > Actually just do > numpy.__version__ That is nicer :) > Somewhat related to this, what is the appropriate way to find the version of > BioPython installed within Python? So I'm not the only person to have wondered about this. For now, I can only suggest an ugly workarround: import Martel print Martel.__version__ Since Biopython 1.45, by convention the Martel version has been incremented to match that of Biopython. Of course, in a few releases time we probably won't be including Martel any more. Perhaps we should add a __version__ to Bio/__init__.py for future releases, with the release "script" modified to ensure this gets incremented to match that used in setup.py (and Martel). Peter From bsouthey at gmail.com Wed Sep 24 17:52:04 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Wed, 24 Sep 2008 16:52:04 -0500 Subject: [Biopython-dev] Versions of numpy/Numeric In-Reply-To: <320fb6e00809241337r2e2edbdeh27b5a793832f762b@mail.gmail.com> References: <320fb6e00809240958x37aa1e97ka2a569e311e2756b@mail.gmail.com> <48DA937B.8000901@gmail.com> <320fb6e00809241337r2e2edbdeh27b5a793832f762b@mail.gmail.com> Message-ID: <48DAB684.4030103@gmail.com> Peter wrote: >> Hi, >> With Numeric 24.2 and Numpy 1.2rc2 installed on linux 64bit and Python >> 2.5.1. >> > > I don't currently have access to a machine with that setup - so this > is very useful. Thanks! > > >> python python setup.py test gives several deprecation warnings from >> test_Cluster and test_KDTree but still pass. >> > > Iteresting - these may be deprecations for Numpy 1.2 - if you have the > output handy could you share it? If its very long, you can just send > it to me off the list (or file a bug with the details). > The not very informative list says it also ''includes a few some minor API breakage first scheduled in the 1.1 release" http://scipy.org/scipy/numpy/milestone/1.2.0 But this is one of them! There is no message in the numpy 1.1 code. There is an email on Jul 13, 2008 (http://www.nabble.com/Newly-deprecated-API-functions-td18436792.html) that notes these are depreciated and will be scheduled removed for Numpy 1.3. So rather annoying! Sorry, I did not realize until I looked that these were 1.2 related. So the relevant output is: test_Cluster ... test_Cluster.py:584: DeprecationWarning: PyArray_FromDims: use PyArray_SimpleNew. matrix = distancematrix(data, mask=mask, weight=weight) test_Cluster.py:584: DeprecationWarning: PyArray_FromDimsAndDataAndDescr: use PyArray_NewFromDescr. matrix = distancematrix(data, mask=mask, weight=weight) test_Cluster.py:629: DeprecationWarning: PyArray_FromDims: use PyArray_SimpleNew. clusterid, error, nfound = kmedoids(matrix, npass=1000) test_Cluster.py:629: DeprecationWarning: PyArray_FromDimsAndDataAndDescr: use PyArray_NewFromDescr. clusterid, error, nfound = kmedoids(matrix, npass=1000) test_Cluster.py:129: DeprecationWarning: PyArray_FromDims: use PyArray_SimpleNew. clusterid, error, nfound = kcluster(data, nclusters=nclusters, mask=mask, weight=weight, transpose=0, npass=100, method='a', dist='e') test_Cluster.py:129: DeprecationWarning: PyArray_FromDimsAndDataAndDescr: use PyArray_NewFromDescr. clusterid, error, nfound = kcluster(data, nclusters=nclusters, mask=mask, weight=weight, transpose=0, npass=100, method='a', dist='e') test_Cluster.py:166: DeprecationWarning: PyArray_FromDims: use PyArray_SimpleNew. clusterid, error, nfound = kcluster(data, nclusters=3, mask=mask, weight=weight, transpose=0, npass=100, method='a', dist='e') test_Cluster.py:166: DeprecationWarning: PyArray_FromDimsAndDataAndDescr: use PyArray_NewFromDescr. clusterid, error, nfound = kcluster(data, nclusters=3, mask=mask, weight=weight, transpose=0, npass=100, method='a', dist='e') test_Cluster.py:522: DeprecationWarning: PyArray_FromDims: use PyArray_SimpleNew. clusterid, celldata = somcluster(data=data, mask=mask, weight=weight, transpose=0, nxgrid=10, nygrid=10, inittau=0.02, niter=100, dist='e') test_Cluster.py:522: DeprecationWarning: PyArray_FromDimsAndDataAndDescr: use PyArray_NewFromDescr. clusterid, celldata = somcluster(data=data, mask=mask, weight=weight, transpose=0, nxgrid=10, nygrid=10, inittau=0.02, niter=100, dist='e') test_Cluster.py:555: DeprecationWarning: PyArray_FromDims: use PyArray_SimpleNew. clusterid, celldata = somcluster(data=data, mask=mask, weight=weight, transpose=0, nxgrid=10, nygrid=10, inittau=0.02, niter=100, dist='e') test_Cluster.py:555: DeprecationWarning: PyArray_FromDimsAndDataAndDescr: use PyArray_NewFromDescr. clusterid, celldata = somcluster(data=data, mask=mask, weight=weight, transpose=0, nxgrid=10, nygrid=10, inittau=0.02, niter=100, dist='e') ok test_KDTree ... /home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.5/Bio/KDTree/KDTree.py:71: DeprecationWarning: PyArray_FromDims: use PyArray_SimpleNew. r=kdt.get_indices() /home/bsouthey/python/biopython_cvs/biopython/build/lib.linux-x86_64-2.5/Bio/KDTree/KDTree.py:71: DeprecationWarning: PyArray_FromDimsAndDataAndDescr: use PyArray_NewFromDescr. r=kdt.get_indices() ok > >> test_MarkovModel fails as I found with BioPython 1.48 (Bug 2589). This is >> most likely a 64-bit thing with Python 2.5. >> >> ERROR: test_MarkovModel >> ---------------------------------------------------------------------- >> Traceback (most recent call last): >> ... >> line 276, in _baum_welch_one >> lp_initial[:] = lp_arcout_t[:,0] >> >> I did notice that both test_MarkovModel.py and test_SVDSuperimposer.py have >> first try to import Numeric - as does MarkovModel.py. >> > > Yeah - I'd raised that on the dev list earlier. Depending on how we > do this, we'll could end up with misleading import failures (an error > about Numeric when we really care about numpy). If however no-one > objects to completely dropping Numeric for the next release, things > become much simpler. > > >> However this same bug is still likely since numpy.oldnumeric is used. >> > > If you have both Numeric and numpy installed, this module is probably > using Numeric and thus still fails. Could you try flipping the > imports round in .../Bio/MarkovModel.py to see if this problem goes > away (i.e. make sure it uses numpy instead of Numeric)? > > If the problem is still there, would you mind also trying the work > around you suggested on Bug 2589 please (dropping the [:] from the > left-hand side)? If that works for you on both numpy and Numeric it > seems a worthwhile change for CVS. > > Thanks > > Peter > Luckily MarkovModel.py is almost self-contained so I used it independently of the installation. The test passes as is with numpy and if I drop the [:] it passes with both Numeric and numpy import statements. Bruce From bugzilla-daemon at portal.open-bio.org Wed Sep 24 18:24:02 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Wed, 24 Sep 2008 18:24:02 -0400 Subject: [Biopython-dev] [Bug 2589] Errors in running tests in 1.48 In-Reply-To: Message-ID: <200809242224.m8OMO2R2019335@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2589 ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2008-09-24 18:24 EST ------- (In reply to comment #3) > > test_MarkovModel > > ================ > > 24.2 > > Based on a Google search, this is a 64bit problem with Python 2.5 and Numeric. > > So either do: > 1) Drop the [:] from the left-hand side: > lp_initial = lp_arcout_t[:,0] Over on the mailing list, Bruce reported this fix works for both numpy and Numeric. I've now checked this into CVS, MarkovModel.py revision 1.6. Thanks Bruce! -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Wed Sep 24 18:24:55 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Wed, 24 Sep 2008 23:24:55 +0100 Subject: [Biopython-dev] Versions of numpy/Numeric In-Reply-To: <48DAB684.4030103@gmail.com> References: <320fb6e00809240958x37aa1e97ka2a569e311e2756b@mail.gmail.com> <48DA937B.8000901@gmail.com> <320fb6e00809241337r2e2edbdeh27b5a793832f762b@mail.gmail.com> <48DAB684.4030103@gmail.com> Message-ID: <320fb6e00809241524r48ee004xfff68ff5a58ef736@mail.gmail.com> > The not very informative list says it also ''includes a few some minor API > breakage first scheduled in the 1.1 release" > http://scipy.org/scipy/numpy/milestone/1.2.0 > > But this is one of them! There is no message in the numpy 1.1 code. There is > an email on Jul 13, 2008 > (http://www.nabble.com/Newly-deprecated-API-functions-td18436792.html) that > notes these are depreciated and will be scheduled removed for Numpy 1.3. So > rather annoying! Yes - this does seem annoying :( > Sorry, I did not realize until I looked that these were 1.2 related. So the > relevant output is: So in summary, the warnings from numpy 1.2 were multiple cases of the following where the old functions will be removed in numpy 1.3: PyArray_FromDims to PyArray_SimpleNew. PyArray_FromDimsAndDataAndDescr to PyArray_NewFromDescr Bio.Cluster will therefore need updating at some point - the next question is when were PyArray_SimpleNew and PyArray_NewFromDescr introduced... >>> test_MarkovModel fails as I found with BioPython 1.48 (Bug 2589). This is >>> most likely a 64-bit thing with Python 2.5. > > Luckily MarkovModel.py is almost self-contained so I used it independently > of the installation. The test passes as is with numpy and if I drop the [:] > it passes with both Numeric and numpy import statements. Great - I've checked that into CVS now. Thanks, Peter From bsouthey at gmail.com Wed Sep 24 21:09:39 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Wed, 24 Sep 2008 20:09:39 -0500 Subject: [Biopython-dev] Versions of numpy/Numeric In-Reply-To: <320fb6e00809241524r48ee004xfff68ff5a58ef736@mail.gmail.com> References: <320fb6e00809240958x37aa1e97ka2a569e311e2756b@mail.gmail.com> <48DA937B.8000901@gmail.com> <320fb6e00809241337r2e2edbdeh27b5a793832f762b@mail.gmail.com> <48DAB684.4030103@gmail.com> <320fb6e00809241524r48ee004xfff68ff5a58ef736@mail.gmail.com> Message-ID: On Wed, Sep 24, 2008 at 5:24 PM, Peter wrote: >> The not very informative list says it also ''includes a few some minor API >> breakage first scheduled in the 1.1 release" >> http://scipy.org/scipy/numpy/milestone/1.2.0 >> >> But this is one of them! There is no message in the numpy 1.1 code. There is >> an email on Jul 13, 2008 >> (http://www.nabble.com/Newly-deprecated-API-functions-td18436792.html) that >> notes these are depreciated and will be scheduled removed for Numpy 1.3. So >> rather annoying! > > Yes - this does seem annoying :( > >> Sorry, I did not realize until I looked that these were 1.2 related. So the >> relevant output is: > > So in summary, the warnings from numpy 1.2 were multiple cases of the > following where the old functions will be removed in numpy 1.3: > > PyArray_FromDims to PyArray_SimpleNew. > PyArray_FromDimsAndDataAndDescr to PyArray_NewFromDescr > > Bio.Cluster will therefore need updating at some point - the next > question is when were PyArray_SimpleNew and PyArray_NewFromDescr > introduced... > I have never used the C-API so I looked at what I have available. The earliest numpy code I have is 0.9.6 and these are defined in the header file: numpy/core/include/numpy/arrayobject.h These are mentioned in the file numpy/doc/CAPI.txt ('Created: October 2005') present at least in the versions from numpy-1.0.1 to numpy-1.1.1 (I don't see it in the release candidate tarball) : "``PyArray_SimpleNew`` is just a macro for ``PyArray_New`` with default arguments. Use ``PyArray_FILLWBYTE(arr, 0)`` to fill with zeros. The ``PyArray_FromDims`` and family of functions are still available and are loose wrappers around this function. These functions still take ``int *`` arguments. This should be fine on 32-bit systems, but on 64-bit systems you may run into trouble if you frequently passed ``PyArray_FromDims`` the dimensions member of the old ``PyArrayObject`` structure because ``sizeof(npy_intp) != sizeof(int)``. " Regards Bruce From mjldehoon at yahoo.com Thu Sep 25 04:47:10 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Thu, 25 Sep 2008 01:47:10 -0700 (PDT) Subject: [Biopython-dev] determining the version In-Reply-To: <320fb6e00809241412r54c2a3a1mc69f3e573f1eaac7@mail.gmail.com> Message-ID: <63700.34226.qm@web62405.mail.re1.yahoo.com> > Perhaps we should add a __version__ to Bio/__init__.py for > future releases, with the release "script" modified to > ensure this gets incremented to match that used in > setup.py (and Martel). Another solution is that setup.py uses (reads or imports) __init__.py to find out what the version is. For example, this is what matplotlib does in its setup.py script. --Michiel. From biopython at maubp.freeserve.co.uk Thu Sep 25 05:22:56 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 25 Sep 2008 10:22:56 +0100 Subject: [Biopython-dev] determining the version In-Reply-To: <63700.34226.qm@web62405.mail.re1.yahoo.com> References: <320fb6e00809241412r54c2a3a1mc69f3e573f1eaac7@mail.gmail.com> <63700.34226.qm@web62405.mail.re1.yahoo.com> Message-ID: <320fb6e00809250222h3d0d15bw763446b5f0ec44d1@mail.gmail.com> On Thu, Sep 25, 2008 at 9:47 AM, Michiel de Hoon wrote: > >> Perhaps we should add a __version__ to Bio/__init__.py for >> future releases, with the release "script" modified to >> ensure this gets incremented to match that used in >> setup.py (and Martel). > > Another solution is that setup.py uses (reads or imports) > __init__.py to find out what the version is. For example, > this is what matplotlib does in its setup.py script. > That sounds more sensible - I had been wondering about how that could be automated but it was late last night. >From a quick look at approach taken in the matplotlib code, we could add something like this to setup.py __version__ = "Undefined" for line in open('Bio/__init__.py'): if (line.startswith('__version__')): exec(line.strip()) setup( name='biopython', version=__version__, author='The Biopython Consortium', ... I'm happy to deal with this if we are agreed that we should add a __version__ to Bio/__init__.py (variations on the naming are possible, but this seems to be a de-facto standard in python libraries). Peter From bugzilla-daemon at portal.open-bio.org Thu Sep 25 07:51:13 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Thu, 25 Sep 2008 07:51:13 -0400 Subject: [Biopython-dev] [Bug 2251] [PATCH] NumPy support for BioPython In-Reply-To: Message-ID: <200809251151.m8PBpDEr028468@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2251 ------- Comment #16 from biopython-bugzilla at maubp.freeserve.co.uk 2008-09-25 07:51 EST ------- We will also need to deal with the deprecation of the following functions in numpy 1.2, which will be removed in numpy 1.3: PyArray_FromDims to PyArray_SimpleNew. PyArray_FromDimsAndDataAndDescr to PyArray_NewFromDescr Quoting the numpy CAPI.txt file (thanks Bruce!), ------ start quote ------- ``PyArray_SimpleNew(nd, dims, typenum)`` is a drop-in replacement for ``PyArray_FromDims`` (except it takes ``npy_intp*`` dims instead of ``int*`` dims which matters on 64-bit systems) and it does not initialize the memory to zero. ``PyArray_SimpleNew`` is just a macro for ``PyArray_New`` with default arguments. Use ``PyArray_FILLWBYTE(arr, 0)`` to fill with zeros. The ``PyArray_FromDims`` and family of functions are still available and are loose wrappers around this function. These functions still take ``int *`` arguments. This should be fine on 32-bit systems, but on 64-bit systems you may run into trouble if you frequently passed ``PyArray_FromDims`` the dimensions member of the old ``PyArrayObject`` structure because ``sizeof(npy_intp) != sizeof(int)``. ------ end quote ------- Here is a recent example of dealing with this - switching part of scipy and how the pointer issue complicates things: http://projects.scipy.org/pipermail/scipy-dev/2008-August/009581.html http://scipy.org/scipy/scipy/ticket/723 See also http://scipy.org/scipy/numpy/ticket/805 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bsouthey at gmail.com Thu Sep 25 09:28:03 2008 From: bsouthey at gmail.com (Bruce Southey) Date: Thu, 25 Sep 2008 08:28:03 -0500 Subject: [Biopython-dev] determining the version In-Reply-To: <320fb6e00809250222h3d0d15bw763446b5f0ec44d1@mail.gmail.com> References: <320fb6e00809241412r54c2a3a1mc69f3e573f1eaac7@mail.gmail.com> <63700.34226.qm@web62405.mail.re1.yahoo.com> <320fb6e00809250222h3d0d15bw763446b5f0ec44d1@mail.gmail.com> Message-ID: <48DB91E3.3060402@gmail.com> Peter wrote: > On Thu, Sep 25, 2008 at 9:47 AM, Michiel de Hoon wrote: > >>> Perhaps we should add a __version__ to Bio/__init__.py for >>> future releases, with the release "script" modified to >>> ensure this gets incremented to match that used in >>> setup.py (and Martel). >>> >> Another solution is that setup.py uses (reads or imports) >> __init__.py to find out what the version is. For example, >> this is what matplotlib does in its setup.py script. >> >> > > That sounds more sensible - I had been wondering about > how that could be automated but it was late last night. > >From a quick look at approach taken in the matplotlib > code, we could add something like this to setup.py > > __version__ = "Undefined" > for line in open('Bio/__init__.py'): > if (line.startswith('__version__')): > exec(line.strip()) > > setup( > name='biopython', > version=__version__, > author='The Biopython Consortium', > ... > > I'm happy to deal with this if we are agreed that we > should add a __version__ to Bio/__init__.py > (variations on the naming are possible, but this seems > to be a de-facto standard in python libraries). > > Peter > > Numpy uses the version.py file to obtain the version and this will also include the svn version if an svn version of numpy is being used. The advantage is that you can follow the developers changes to find when something was fixed or broke. I think the same idea would work for Biopython especially once it moves to svn. For the 1.2.0rc2: >>> numpy.__version__ '1.2.0rc2' Bruce From biopython at maubp.freeserve.co.uk Thu Sep 25 12:15:54 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 25 Sep 2008 17:15:54 +0100 Subject: [Biopython-dev] Sequences and simple plots Message-ID: <320fb6e00809250915m42350c70xa51007c50c3c95fe@mail.gmail.com> Hi all, I've just added a couple of Bio.SeqIO with pylab examples to the cookbook chapter of the Biopython tutorial. The first shows a histogram of sequence lengths in a FASTA file (based having recently done this for some real assembly data). http://biopython.org/DIST/docs/tutorial/images/hist_plot.png The second is based on the GC% example we used for the BOSC 2008 presentation (see http://biopython.org/wiki/Documentation#Presentations for the original). http://biopython.org/DIST/docs/tutorial/images/gc_plot.png If anyone has any suggestions for similar examples let me know (with code would be great - but even a nice idea is worthwhile). Peter From biopython at maubp.freeserve.co.uk Thu Sep 25 12:58:37 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 25 Sep 2008 17:58:37 +0100 Subject: [Biopython-dev] Sequences and simple plots In-Reply-To: <320fb6e00809250915m42350c70xa51007c50c3c95fe@mail.gmail.com> References: <320fb6e00809250915m42350c70xa51007c50c3c95fe@mail.gmail.com> Message-ID: <320fb6e00809250958y3605b932w391cab1e7c3507f9@mail.gmail.com> > If anyone has any suggestions for similar examples let me know (with code > would be great - but even a nice idea is worthwhile). How about this example which draws a simple nucleotide dot plot for the first two sequences in the input FASTA file? #Step One, load the first two sequences as input from Bio import SeqIO handle = open("ls_orchid.fasta") record_iterator = SeqIO.parse(handle, "fasta") rec_one = record_iterator.next() rec_two = record_iterator.next() handle.close() print "Comparing %s to %s" % (rec_one.id, rec_two.id) #Step Two, compile a similarity matrix # For simplicity, this is constructed as a list of lists # of booleans (using a mismatch threshold would be more # complicated). Also I'm recording mismatches rather than # matches because that gives a nice image with the pylab # gray colour scheme used later. window = 7 seq_one = rec_one.seq.tostring() seq_two = rec_two.seq.tostring() data = [[(seq_one[i:i+window] <> seq_two[j:j+window]) \ for j in range(len(seq_one)-window)] \ for i in range(len(seq_two)-window)] #Step Three, plot using pylab import pylab pylab.gray() pylab.imshow(data) pylab.xlabel("%s (length %i bp)" % (rec_one.id, len(rec_one))) pylab.ylabel("%s (length %i bp)" % (rec_two.id, len(rec_two))) pylab.title("Dot plot using window size %i\n(allowing no miss-matches)" \ % window) #pylab.show() pylab.savefig("dot_plot.png", dpi=75) pylab.savefig("dot_plot.pdf") Peter From jflatow at northwestern.edu Thu Sep 25 14:34:00 2008 From: jflatow at northwestern.edu (Jared Flatow) Date: Thu, 25 Sep 2008 13:34:00 -0500 Subject: [Biopython-dev] Sequences and simple plots In-Reply-To: <320fb6e00809250915m42350c70xa51007c50c3c95fe@mail.gmail.com> References: <320fb6e00809250915m42350c70xa51007c50c3c95fe@mail.gmail.com> Message-ID: <5321F5EB-F2C1-4D1A-9A67-878C695C0945@northwestern.edu> Hi Peter, Good ideas for some useful examples! (though I can't actually find them in the cookbook...) Anyway I hope you don't mind but while I was looking, I added another example for SeqIO input/output that uses the new format method: http://biopython.org/wiki/SeqIO I tend to prefer this type of method to SeqIO.write, though I don't think it appears anywhere in the documentation Regards, jared On Sep 25, 2008, at 11:15 AM, Peter wrote: > Hi all, > > I've just added a couple of Bio.SeqIO with pylab examples to the > cookbook > chapter of the Biopython tutorial. > > The first shows a histogram of sequence lengths in a FASTA file (based > having recently done this for some real assembly data). > http://biopython.org/DIST/docs/tutorial/images/hist_plot.png > > The second is based on the GC% example we used for the BOSC 2008 > presentation (see http://biopython.org/wiki/ > Documentation#Presentations > for the original). http://biopython.org/DIST/docs/tutorial/images/gc_plot.png > > If anyone has any suggestions for similar examples let me know (with > code > would be great - but even a nice idea is worthwhile). > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev > From thomas at cbs.dtu.dk Thu Sep 25 14:57:52 2008 From: thomas at cbs.dtu.dk (Thomas Sicheritz-Ponten) Date: Thu, 25 Sep 2008 20:57:52 +0200 Subject: [Biopython-dev] Cleaning up Bio.SeqUtils In-Reply-To: <320fb6e00809170723w95c87b0r94c4b12574ce174f@mail.gmail.com> References: <320fb6e00809170723w95c87b0r94c4b12574ce174f@mail.gmail.com> Message-ID: <48DBDF30.3060106@cbs.dtu.dk> Hej all, as I am guilty for most of the functions in SeqUtils/__init__.py, I might as well join the cleaning team ... apply_on_multi_fasta and quicker_apply_on_multi_fasta were only functions to the turn the original SeqUtils.py into a possible standalone program, but I guess not many actually used it. On the other hand quick_FASTA_reader was and still is used by a lot of people, despite the irritating splitting bug which occurs if an entry name happens to contain '>' ... Also, the translate and complement functions are from the time were these functions were not easily accessed (we are talking about 2001-2002) In my opinion, apply_on_multi_fasta, quicker_apply_on_multi_fasta and the redundant translation machinery could and should get removed. Also if one can change the split function in quick_FASTA_reader? (I don't have had checkin access since a long time) Are there any other dubios functions we should discuss? cheers -thomas -- Sicheritz-Ponten Thomas, Associate Professor, Ph.D ( Head of Metagenomics, Technical University of Denmark \ Center for Biological Sequence Analysis, BioCentrum ) CBS: +45 45 252422 Building 208, DK-2800 Lyngby ##-----> Fax: +45 45 931585 http://www.cbs.dtu.dk/~thomas ) / ... damn arrow eating trees ... ( Peter wrote: > Dear all, > > I've previously mentioned the idea of cleaning up > Bio/SeqUtils/__init__.py in passing. I've been reminded about this by > Bug 2585 where Sebastian spotted a problem in one of the FASTA related > functions. > http://bugzilla.open-bio.org/show_bug.cgi?id=2585 > > I've updated the docstrings in CVS to describe the three functions > quick_FASTA_reader, apply_on_multi_fasta and > quicker_apply_on_multi_fasta as obsolete but I would like to suggest > going further and deprecating them. > > There are other dubious or redundant functions in > Bio/SeqUtils/__init__.py such as a translate function. Again, would > there be any objection to deprecating this too? > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev From biopython at maubp.freeserve.co.uk Thu Sep 25 15:39:49 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 25 Sep 2008 20:39:49 +0100 Subject: [Biopython-dev] Sequences and simple plots In-Reply-To: <5321F5EB-F2C1-4D1A-9A67-878C695C0945@northwestern.edu> References: <320fb6e00809250915m42350c70xa51007c50c3c95fe@mail.gmail.com> <5321F5EB-F2C1-4D1A-9A67-878C695C0945@northwestern.edu> Message-ID: <320fb6e00809251239i6308d6b9i6a334701ce1cd5f1@mail.gmail.com> On Thu, Sep 25, 2008 at 7:34 PM, Jared Flatow wrote: > > Hi Peter, > > Good ideas for some useful examples! (though I can't actually find them in > the cookbook...) They are in CVS only at the moment - I can send you the PDF of the current tutorial if you like off list. We don't normally update the tutorial on the website except as part of making a new release - this avoid the tutorial talking about unreleased code. > Anyway I hope you don't mind but while I was looking, I added another > example for SeqIO input/output that uses the new format method: > > http://biopython.org/wiki/SeqIO > > I tend to prefer this type of method to SeqIO.write, though I don't think it > appears anywhere in the documentation The format method should be in the Tutorial as of Biopython 1.48 (see the final section of Chapters 4 and 5). Personally I think for your new example using "with" just confuses things, but otherwise mentioning the format() method in this context makes sense. I would probably make it explicit that this with ONLY work for sequential file formats - which is why I prefer to encourage the SeqIO.write() method giving all the records at once (possibly as an iterator). Peter From biopython at maubp.freeserve.co.uk Thu Sep 25 15:50:05 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Thu, 25 Sep 2008 20:50:05 +0100 Subject: [Biopython-dev] Cleaning up Bio.SeqUtils In-Reply-To: <48DBDF30.3060106@cbs.dtu.dk> References: <320fb6e00809170723w95c87b0r94c4b12574ce174f@mail.gmail.com> <48DBDF30.3060106@cbs.dtu.dk> Message-ID: <320fb6e00809251250l4f905490i328731c741c1dfc8@mail.gmail.com> On Thu, Sep 25, 2008 at 7:57 PM, Thomas Sicheritz-Ponten wrote: > Hej all, > > as I am guilty for most of the functions in SeqUtils/__init__.py, I might as > well join the cleaning team ... Excellent :) > apply_on_multi_fasta and quicker_apply_on_multi_fasta were only functions to > the turn the original SeqUtils.py into a possible standalone program, but I > guess not many actually used it. That would explain some of that module's style. We could deprecate the standalone bit too when we deprecate these functions. > On the other hand quick_FASTA_reader was and still is used by a lot of > people, despite the irritating splitting bug which occurs if an entry name > happens to contain '>' ... We should probably fix that if you think it can be done without loosing the current simplicity and speed (see below). > Also, the translate and complement functions are from the time were these > functions were not easily accessed (we are talking about 2001-2002) That does make sense - its a shame with hindsight that Biopython ended up with several ways to do this. > In my opinion, apply_on_multi_fasta, quicker_apply_on_multi_fasta and the > redundant translation machinery could and should get removed. OK. We should probably ask on the main list as a courtesy, and then deprecate them for the next release. > Also if one can change the split function in quick_FASTA_reader? (I don't > have had checkin access since a long time) If this is just an expired account / lost password you could try emailing the OBF support guys directly. If they need someone to vouch for you drop me or Michiel an email off list. In the short term I'm happy to check in a patch on your behalf (by email or via a bug report). > Are there any other dubios functions we should discuss? I'm sure there are more - but that should keep us busy for now :) Are you happy with my recent tweak to the seq3 function (CVS revision 1.15)? I wasn't 100% sure why it had used "Xer" http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/SeqUtils/__init__.py.diff?r1=1.14&r2=1.15&cvsroot=biopython Thanks, Peter From jblanca at btc.upv.es Thu Sep 25 10:49:34 2008 From: jblanca at btc.upv.es (Jose Blanca) Date: Thu, 25 Sep 2008 16:49:34 +0200 Subject: [Biopython-dev] about the SeqRecord and SeqFeature classes In-Reply-To: <320fb6e00809230737h223e6e3dgac6bf0fbbf4af41@mail.gmail.com> References: <200809231440.24684.jblanca@btc.upv.es> <320fb6e00809230737h223e6e3dgac6bf0fbbf4af41@mail.gmail.com> Message-ID: <200809251649.34934.jblanca@btc.upv.es> Hi: On Tuesday 23 September 2008 16:37:29 Peter wrote: > > SeqRecord still doesn't have a __getitem__ method. > > What do you think of the __getitem__ method proposed in attachment 942 > on Bug 2507? > http://bugzilla.open-bio.org/show_bug.cgi?id=2507 I've been looking at the path and is just what I need. Using a SeqRecord with that __getitem__ method is almost trivial. Attach to this email inside mySeqRecord.py is a possible implementation. What do you think? For the qualities a tuple of ints would do. For implementing some details new style classes would be better. Are you planning to move Seq and SeqRecord to the new style? Best regards, -- Jose M. Blanca Postigo Instituto Universitario de Conservacion y Mejora de la Agrodiversidad Valenciana (COMAV) Universidad Politecnica de Valencia (UPV) Edificio CPI (Ciudad Politecnica de la Innovacion), 8E 46022 Valencia (SPAIN) Tlf.:+34-96-3877000 (ext 88473) -------------- next part -------------- A non-text attachment was scrubbed... Name: mySeqRecord.py Type: application/x-python Size: 10949 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: seqrecordtest.py Type: application/x-python Size: 4587 bytes Desc: not available URL: From thomas at cbs.dtu.dk Thu Sep 25 18:47:58 2008 From: thomas at cbs.dtu.dk (Thomas Sicheritz-Ponten) Date: Fri, 26 Sep 2008 00:47:58 +0200 Subject: [Biopython-dev] Cleaning up Bio.SeqUtils In-Reply-To: <320fb6e00809251250l4f905490i328731c741c1dfc8@mail.gmail.com> References: <320fb6e00809170723w95c87b0r94c4b12574ce174f@mail.gmail.com> <48DBDF30.3060106@cbs.dtu.dk> <320fb6e00809251250l4f905490i328731c741c1dfc8@mail.gmail.com> Message-ID: <48DC151E.3090802@cbs.dtu.dk> Peter, can you check in the corrected version of quick_FASTA_reader for me? I added the changes which were suggested in earlier posts (changes not affecting speed and simplicity) def quick_FASTA_reader(file): "simple and quick FASTA reader to be used on large FASTA files" from os import linesep txt = open(file).read() entries = [] splitter = "%s>" % linesep for entry in txt.split(splitter): name,seq= entry.split(linesep,1) if name[0]=='>': name = name[1:] seq = seq.replace('\n','').replace(' ','').upper() entries.append((name, seq)) return entries Concerning the seq3 function, I am not sure where it came from, I don't think I have added it. cheers -thomas Peter wrote: > On Thu, Sep 25, 2008 at 7:57 PM, Thomas Sicheritz-Ponten > wrote: >> Hej all, >> >> as I am guilty for most of the functions in SeqUtils/__init__.py, I might as >> well join the cleaning team ... > > Excellent :) > >> apply_on_multi_fasta and quicker_apply_on_multi_fasta were only functions to >> the turn the original SeqUtils.py into a possible standalone program, but I >> guess not many actually used it. > > That would explain some of that module's style. We could deprecate > the standalone bit too when we deprecate these functions. > >> On the other hand quick_FASTA_reader was and still is used by a lot of >> people, despite the irritating splitting bug which occurs if an entry name >> happens to contain '>' ... > > We should probably fix that if you think it can be done without > loosing the current simplicity and speed (see below). > >> Also, the translate and complement functions are from the time were these >> functions were not easily accessed (we are talking about 2001-2002) > > That does make sense - its a shame with hindsight that Biopython ended > up with several ways to do this. > >> In my opinion, apply_on_multi_fasta, quicker_apply_on_multi_fasta and the >> redundant translation machinery could and should get removed. > > OK. We should probably ask on the main list as a courtesy, and then > deprecate them for the next release. > >> Also if one can change the split function in quick_FASTA_reader? (I don't >> have had checkin access since a long time) > > If this is just an expired account / lost password you could try > emailing the OBF support guys directly. If they need someone to vouch > for you drop me or Michiel an email off list. In the short term I'm > happy to check in a patch on your behalf (by email or via a bug > report). > >> Are there any other dubios functions we should discuss? > > I'm sure there are more - but that should keep us busy for now :) > > Are you happy with my recent tweak to the seq3 function (CVS revision > 1.15)? I wasn't 100% sure why it had used "Xer" > > http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/SeqUtils/__init__.py.diff?r1=1.14&r2=1.15&cvsroot=biopython > > Thanks, > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev -- Sicheritz-Ponten Thomas, Associate Professor, Ph.D ( Head of Metagenomics, Technical University of Denmark \ Center for Biological Sequence Analysis, BioCentrum ) CBS: +45 45 252422 Building 208, DK-2800 Lyngby ##-----> Fax: +45 45 931585 http://www.cbs.dtu.dk/~thomas ) / ... damn arrow eating trees ... ( From biopython at maubp.freeserve.co.uk Fri Sep 26 05:38:57 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 26 Sep 2008 10:38:57 +0100 Subject: [Biopython-dev] Cleaning up Bio.SeqUtils In-Reply-To: <48DC151E.3090802@cbs.dtu.dk> References: <320fb6e00809170723w95c87b0r94c4b12574ce174f@mail.gmail.com> <48DBDF30.3060106@cbs.dtu.dk> <320fb6e00809251250l4f905490i328731c741c1dfc8@mail.gmail.com> <48DC151E.3090802@cbs.dtu.dk> Message-ID: <320fb6e00809260238n41e027c7g877bc040b49cb1e4@mail.gmail.com> On Thu, Sep 25, 2008 at 11:47 PM, Thomas Sicheritz-Ponten wrote: > Peter, can you check in the corrected version of quick_FASTA_reader for me? > I added the changes which were suggested in earlier posts (changes not > affecting speed and simplicity) > > def quick_FASTA_reader(file): > "simple and quick FASTA reader to be used on large FASTA files" > from os import linesep > txt = open(file).read() > entries = [] > splitter = "%s>" % linesep > for entry in txt.split(splitter): > name,seq= entry.split(linesep,1) > if name[0]=='>': name = name[1:] > seq = seq.replace('\n','').replace(' ','').upper() > entries.append((name, seq)) > return entries I'm pretty sure we shouldn't be using os.linesep in this way. I'd have to double check on a Windows box to confirm this, but I believe from memory that any CRLF in the file becomes just a \n in python. The basic idea is we want to split on "\n>" so that any additional ">" inside a name are ignored. This than means the first record in the file is a special case. You've also added an extra if statement in the loop - I assume to cope with the fact that using a split on "\n>" would leave a leading ">" on the first record's name -- but this would go wrong if the name itself started with a ">" too (i.e. a line starting with ">>..." which would be unusual). Perhaps instead, as a typical FASTA file starts immediately with ">" we can just do the split on "\n"+contents of file. I've updated CVS based on this, and added a minimal test for quick_FASTA_reader (and GC) to test_SeqUtils.py as well. Checking in Bio/SeqUtils/__init__.py; /home/repository/biopython/biopython/Bio/SeqUtils/__init__.py,v <-- __init__.py new revision: 1.17; previous revision: 1.16 done Checking in Tests/test_SeqUtils.py; /home/repository/biopython/biopython/Tests/test_SeqUtils.py,v <-- test_SeqUtils.py new revision: 1.2; previous revision: 1.1 done Checking in Tests/output/test_SeqUtils; /home/repository/biopython/biopython/Tests/output/test_SeqUtils,v <-- test_SeqUtils new revision: 1.2; previous revision: 1.1 done Could you have a look at Bio/SeqUtils/__init__.py revision 1.17 for review? It will be up on ViewCVS shortly... http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/SeqUtils/__init__.py?cvsroot=biopython Do you think I should remove the "OBSOLETE" tag in the docstring for the quick_FASTA_reader function? > Concerning the seq3 function, I am not sure where it came from, I don't > think I have added it. OK, thanks. Peter From biopython at maubp.freeserve.co.uk Fri Sep 26 05:50:39 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 26 Sep 2008 10:50:39 +0100 Subject: [Biopython-dev] about the SeqRecord and SeqFeature classes In-Reply-To: <200809251649.34934.jblanca@btc.upv.es> References: <200809231440.24684.jblanca@btc.upv.es> <320fb6e00809230737h223e6e3dgac6bf0fbbf4af41@mail.gmail.com> <200809251649.34934.jblanca@btc.upv.es> Message-ID: <320fb6e00809260250r66422454g2a5ec665330dd934@mail.gmail.com> On Thu, Sep 25, 2008 at 3:49 PM, Jose Blanca wrote: > Hi: > > On Tuesday 23 September 2008 16:37:29 Peter wrote: >> > SeqRecord still doesn't have a __getitem__ method. >> >> What do you think of the __getitem__ method proposed in attachment 942 >> on Bug 2507? >> http://bugzilla.open-bio.org/show_bug.cgi?id=2507 > I've been looking at the patch and is just what I need. > Using a SeqRecord with that __getitem__ method is almost trivial. Good :) I'd like to check this into CVS but it would be best to have a third person comment on the code first. Once (if) this is included, I would then plan to use this for slicing alignment objects (Bug 2551) http://bugzilla.open-bio.org/show_bug.cgi?id=2551 > Attach to this email inside mySeqRecord.py is a possible implementation. > What do you think? For the qualities a tuple of ints would do. I see you have created a subclass the SeqRecord to add a quality property, and made sure this gets sliced too in the __getitem__. This is a nice approach (and demonstrates how people could extend the basic Biopython objects in their own code). I would also suggest in the __init__ method checking that the quality sequence is the same length as the sequence itself. Your code looks like it would cope with any python sequence object (string, list, tuple) for the quality, and you could use integers or floats here. Very flexible. If we were to add something like this to Biopython directly, I prefer "quality" over "qual" (just three letters longer but much clearer). I would also consider adding the quality to the Seq object (subclassing the Seq object rather than the SeqRecord object). My reasoning is that for 454 or Solexa sequencing, you will have thousands of reads and all you really care about is the nucleotide sequence and the quality scores. Unless you want to give them all unique names, there little point having the overhead of the various annotation properties of the SeqRecord. > For implementing some details new style classes would be better. Are you > planning to move Seq and SeqRecord to the new style? If we have a good reason to - adding docstrings to the properties would be nice. Peter From thomas at cbs.dtu.dk Fri Sep 26 05:54:12 2008 From: thomas at cbs.dtu.dk (Thomas Sicheritz-Ponten) Date: Fri, 26 Sep 2008 11:54:12 +0200 Subject: [Biopython-dev] Cleaning up Bio.SeqUtils In-Reply-To: <320fb6e00809260238n41e027c7g877bc040b49cb1e4@mail.gmail.com> References: <320fb6e00809170723w95c87b0r94c4b12574ce174f@mail.gmail.com> <48DBDF30.3060106@cbs.dtu.dk> <320fb6e00809251250l4f905490i328731c741c1dfc8@mail.gmail.com> <48DC151E.3090802@cbs.dtu.dk> <320fb6e00809260238n41e027c7g877bc040b49cb1e4@mail.gmail.com> Message-ID: <48DCB144.9020801@cbs.dtu.dk> Ok, fair enough :-) Please remove also the OBSOLETE tag - as Bio.SeqIO.parse is not really a substitution for quick_FASTA_reader cheers -thomas Peter wrote: > On Thu, Sep 25, 2008 at 11:47 PM, Thomas Sicheritz-Ponten > wrote: >> Peter, can you check in the corrected version of quick_FASTA_reader for me? >> I added the changes which were suggested in earlier posts (changes not >> affecting speed and simplicity) >> >> def quick_FASTA_reader(file): >> "simple and quick FASTA reader to be used on large FASTA files" >> from os import linesep >> txt = open(file).read() >> entries = [] >> splitter = "%s>" % linesep >> for entry in txt.split(splitter): >> name,seq= entry.split(linesep,1) >> if name[0]=='>': name = name[1:] >> seq = seq.replace('\n','').replace(' ','').upper() >> entries.append((name, seq)) >> return entries > > I'm pretty sure we shouldn't be using os.linesep in this way. I'd > have to double check on a Windows box to confirm this, but I believe > from memory that any CRLF in the file becomes just a \n in python. > > The basic idea is we want to split on "\n>" so that any additional ">" > inside a name are ignored. This than means the first record in the > file is a special case. You've also added an extra if statement in > the loop - I assume to cope with the fact that using a split on "\n>" > would leave a leading ">" on the first record's name -- but this would > go wrong if the name itself started with a ">" too (i.e. a line > starting with ">>..." which would be unusual). > > Perhaps instead, as a typical FASTA file starts immediately with ">" > we can just do the split on "\n"+contents of file. I've updated CVS > based on this, and added a minimal test for quick_FASTA_reader (and > GC) to test_SeqUtils.py as well. > > Checking in Bio/SeqUtils/__init__.py; > /home/repository/biopython/biopython/Bio/SeqUtils/__init__.py,v <-- > __init__.py > new revision: 1.17; previous revision: 1.16 > done > Checking in Tests/test_SeqUtils.py; > /home/repository/biopython/biopython/Tests/test_SeqUtils.py,v <-- > test_SeqUtils.py > new revision: 1.2; previous revision: 1.1 > done > Checking in Tests/output/test_SeqUtils; > /home/repository/biopython/biopython/Tests/output/test_SeqUtils,v <-- > test_SeqUtils > new revision: 1.2; previous revision: 1.1 > done > > Could you have a look at Bio/SeqUtils/__init__.py revision 1.17 for > review? It will be up on ViewCVS shortly... > http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/SeqUtils/__init__.py?cvsroot=biopython > > Do you think I should remove the "OBSOLETE" tag in the docstring for > the quick_FASTA_reader function? > >> Concerning the seq3 function, I am not sure where it came from, I don't >> think I have added it. > > OK, thanks. > > Peter > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev -- Sicheritz-Ponten Thomas, Associate Professor, Ph.D ( Head of Metagenomics, Technical University of Denmark \ Center for Biological Sequence Analysis, BioCentrum ) CBS: +45 45 252422 Building 208, DK-2800 Lyngby ##-----> Fax: +45 45 931585 http://www.cbs.dtu.dk/~thomas ) / ... damn arrow eating trees ... ( From biopython at maubp.freeserve.co.uk Fri Sep 26 06:08:04 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 26 Sep 2008 11:08:04 +0100 Subject: [Biopython-dev] Cleaning up Bio.SeqUtils In-Reply-To: <48DCB144.9020801@cbs.dtu.dk> References: <320fb6e00809170723w95c87b0r94c4b12574ce174f@mail.gmail.com> <48DBDF30.3060106@cbs.dtu.dk> <320fb6e00809251250l4f905490i328731c741c1dfc8@mail.gmail.com> <48DC151E.3090802@cbs.dtu.dk> <320fb6e00809260238n41e027c7g877bc040b49cb1e4@mail.gmail.com> <48DCB144.9020801@cbs.dtu.dk> Message-ID: <320fb6e00809260308q58659aech1b5671ce76f1eeef@mail.gmail.com> On Fri, Sep 26, 2008 at 10:54 AM, Thomas Sicheritz-Ponten wrote: > Ok, fair enough :-) > Please remove also the OBSOLETE tag - as Bio.SeqIO.parse is not really a > substitution for quick_FASTA_reader OK, I've done that and reworded the docstring. I agree that Bio.SeqIO is not a direct substitute for quick_FASTA_reader but they both have their plus points. I'll send out an email to the main list about deprecating the following: Using Bio/SeqUtils as a script Bio.SeqUtils.apply_on_multi_fasta Bio.SeqUtils.quicker_apply_on_multi_fasta Bio.SeqUtils.translate What about fasta_uniqids? It reads a file but prints to screen which doesn't seem useful in a python script. Peter From biopython at maubp.freeserve.co.uk Fri Sep 26 06:15:52 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 26 Sep 2008 11:15:52 +0100 Subject: [Biopython-dev] Sequences and simple plots In-Reply-To: <320fb6e00809251239i6308d6b9i6a334701ce1cd5f1@mail.gmail.com> References: <320fb6e00809250915m42350c70xa51007c50c3c95fe@mail.gmail.com> <5321F5EB-F2C1-4D1A-9A67-878C695C0945@northwestern.edu> <320fb6e00809251239i6308d6b9i6a334701ce1cd5f1@mail.gmail.com> Message-ID: <320fb6e00809260315x62634eadw6b0dd17e074bdeb2@mail.gmail.com> On Thu, Sep 25, 2008 at 8:39 PM, Peter wrote: > On Thu, Sep 25, 2008 at 7:34 PM, Jared Flatow wrote: >> >> Hi Peter, >> >> Good ideas for some useful examples! (though I can't actually find them in >> the cookbook...) > > They are in CVS only at the moment - I can send you the PDF of the > current tutorial if you like off list. We don't normally update the > tutorial on the website except as part of making a new release - this > avoid the tutorial talking about unreleased code. Cut and paste for people to comment on directly, The first shows a histogram of sequence lengths in a FASTA file (based having recently done this for some real assembly data). Sample output: http://biopython.org/DIST/docs/tutorial/images/hist_plot.png from Bio import SeqIO handle = open("ls_orchid.fasta") sizes = [len(seq_record) for seq_record in SeqIO.parse(handle, "fasta")] handle.close() import pylab pylab.hist(sizes, bins=20) pylab.title("%i orchid sequences\nLengths %i to %i" \ % (len(sizes),min(sizes),max(sizes))) pylab.xlabel("Sequence length (bp)") pylab.ylabel("Count") pylab.show() The second is based on the GC% example we used for the BOSC 2008 presentation: http://biopython.org/DIST/docs/tutorial/images/gc_plot.png from Bio import SeqIO from Bio.SeqUtils import GC handle = open("ls_orchid.fasta") gc_values = [GC(seq_record.seq) for seq_record in SeqIO.parse(handle, "fasta")] gc_values.sort() handle.close() import pylab pylab.plot(gc_values) pylab.title("%i orchid sequences\nGC%% %0.1f to %0.1f" \ % (len(gc_values),min(gc_values),max(gc_values))) pylab.xlabel("Genes") pylab.ylabel("GC%") pylab.show() Peter From biopython at maubp.freeserve.co.uk Fri Sep 26 07:02:00 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 26 Sep 2008 12:02:00 +0100 Subject: [Biopython-dev] Cleaning up Bio.SeqUtils In-Reply-To: <48DC151E.3090802@cbs.dtu.dk> References: <320fb6e00809170723w95c87b0r94c4b12574ce174f@mail.gmail.com> <48DBDF30.3060106@cbs.dtu.dk> <320fb6e00809251250l4f905490i328731c741c1dfc8@mail.gmail.com> <48DC151E.3090802@cbs.dtu.dk> Message-ID: <320fb6e00809260402k612b0465uf0326d5c5cb48dff@mail.gmail.com> >> Are you happy with my recent tweak to the seq3 function (CVS revision >> 1.15)? I wasn't 100% sure why it had used "Xer" It just occurred to me this could be short for "X error"? > Concerning the seq3 function, I am not sure where it came from, I don't > think I have added it. > Looking over the CVS logs, I think it might have been you (CVS user "thomas") - but it was six years ago. See Bio/SeqUtils/__init__.py revision 1.2 http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/SeqUtils/__init__.py?cvsroot=biopython The comments say Bio.SeqUtils.seq3 was inspired by BioPerl. I've only skimmed the BioPerl SVN history, but they do seem to use "Xaa" and not "Xer", http://code.open-bio.org/svnweb/index.cgi/bioperl/view/bioperl-live/trunk/Bio/SeqUtils.pm Peter From bugzilla-daemon at portal.open-bio.org Fri Sep 26 08:44:16 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 26 Sep 2008 08:44:16 -0400 Subject: [Biopython-dev] [Bug 2425] Fasta ID parsing error In-Reply-To: Message-ID: <200809261244.m8QCiGji013606@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2425 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-09-26 08:44 EST ------- (In reply to comment #1) > I assume in your example you expected "region1.fasta.screen.Contig1" to be > used as the record key in BioSQL? There is a 40 character limit on this > field, which should be fine for most FASTA identifiers. In BioSQL v1.0.1, fields bioentry.accession and dbxref.accession were increased from 40 to 128 characters. See http://lists.open-bio.org/pipermail/biosql-l/2008-August/001311.html However, bioentry.name is still only 40 characters. It looks like for a FASTA file like this: >gi|9629357|ref|NC_001802.1| Human immunodeficiency virus type 1, complete genome GGTCTCTCTGGTTAGACCAGATCTGAGCCTGGGAGCTCTCTGGCTAACTAGGGAACCCACTGCTTAAGCC TCAATAAAGCTTGCCTTGAGTGCTTCAAGTAGTGTGTGCCCGTCTGTTGTGTGACTCTGGTAACTAGAGA ... BioPerl will use "gi|9629357|ref|NC_001802.1|" as bioentry.name and bioentry.identifier with "Human immunodeficiency virus type 1, complete genome" as bioentry.description, 0 as the version (BioSQL convention when unknown), with bioentry.taxon_id and bioentry.division as NULL. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From jblanca at btc.upv.es Fri Sep 26 09:18:02 2008 From: jblanca at btc.upv.es (Jose Blanca) Date: Fri, 26 Sep 2008 15:18:02 +0200 Subject: [Biopython-dev] about the SeqRecord and SeqFeature classes In-Reply-To: <320fb6e00809260250r66422454g2a5ec665330dd934@mail.gmail.com> References: <200809231440.24684.jblanca@btc.upv.es> <200809251649.34934.jblanca@btc.upv.es> <320fb6e00809260250r66422454g2a5ec665330dd934@mail.gmail.com> Message-ID: <200809261518.02453.jblanca@btc.upv.es> Hi: > I see you have created a subclass the SeqRecord to add a quality > property, and made sure this gets sliced too in the __getitem__. This > is a nice approach (and demonstrates how people could extend the basic > Biopython objects in their own code). I would also suggest in the > __init__ method checking that the quality sequence is the same length > as the sequence itself. To do that in a proper way I would like to use property, that's why I was asking for the possibility of transforming SeqRecord and Seq in new style classes. > If we were to add something like this to Biopython directly, I prefer > "quality" over "qual" (just three letters longer but much clearer). That's not a problem. I used qual to do it similar to .seq > I would also consider adding the quality to the Seq object (subclassing > the Seq object rather than the SeqRecord object). My reasoning is > that for 454 or Solexa sequencing, you will have thousands of reads > and all you really care about is the nucleotide sequence and the > quality scores. Unless you want to give them all unique names, there > little point having the overhead of the various annotation properties > of the SeqRecord. I didn't subclass Seq because if we want a quality without name we could just use a tuple or a list. My idea was to create a class with two main properties, seq and qual (or quality). Seq does not has a seq property, it is a sequence. Since SeqRecord already has a seq property I subclassed it adding the qual property. Another alternative would be to create a new SeqWithQuality class without subclassing SeqRecord. I looked at the BioPerl model. They have several classes dealing with sequences and qualities: Seq: - has a seq property (unlike BioPython's Seq that is a sequence and has no seq property). Besides has and id or a name. Qual: - has a qual property, and an id or a name. SeqWithQual: - has a seq and Qual properties. I didn't create a Qual class with a qual property and a name because there is no Seq class with a seq an a name. I thought that a tuple or a list of ints would be equivalent to BioPython's Seq and would take the part of the BioPerl's Qual. What do you think about this model? I agree that this classes should be prepared to deal with a lot of sequences and they should be efficient. But I don't have the experience to foresee which model would be better in that regard. -- Jose M. Blanca Postigo Instituto Universitario de Conservacion y Mejora de la Agrodiversidad Valenciana (COMAV) Universidad Politecnica de Valencia (UPV) Edificio CPI (Ciudad Politecnica de la Innovacion), 8E 46022 Valencia (SPAIN) Tlf.:+34-96-3877000 (ext 88473) From bugzilla-daemon at portal.open-bio.org Fri Sep 26 09:30:13 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 26 Sep 2008 09:30:13 -0400 Subject: [Biopython-dev] [Bug 2425] Fasta ID parsing error In-Reply-To: Message-ID: <200809261330.m8QDUDwJ016360@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2425 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |FIXED ------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk 2008-09-26 09:30 EST ------- OK, I think this is fixed in CVS now. I have also updated the test_BioSQL_SeqIO.py unit test to check importing and retrieving a range of different FASTA files. Of course, having a second person double check this works would be great. Feel free to comment here (or reopen the bug) as appropriate. Peter -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From jflatow at northwestern.edu Fri Sep 26 09:40:46 2008 From: jflatow at northwestern.edu (Jared Flatow) Date: Fri, 26 Sep 2008 08:40:46 -0500 Subject: [Biopython-dev] Sequences and simple plots In-Reply-To: <320fb6e00809260315x62634eadw6b0dd17e074bdeb2@mail.gmail.com> References: <320fb6e00809250915m42350c70xa51007c50c3c95fe@mail.gmail.com> <5321F5EB-F2C1-4D1A-9A67-878C695C0945@northwestern.edu> <320fb6e00809251239i6308d6b9i6a334701ce1cd5f1@mail.gmail.com> <320fb6e00809260315x62634eadw6b0dd17e074bdeb2@mail.gmail.com> Message-ID: On Sep 26, 2008, at 5:15 AM, Peter wrote: > Cut and paste for people to comment on directly, Ok, cool. > The first shows a histogram of sequence lengths in a FASTA file (based > having recently done this for some real assembly data). Sample > output: > http://biopython.org/DIST/docs/tutorial/images/hist_plot.png > > from Bio import SeqIO > handle = open("ls_orchid.fasta") > sizes = [len(seq_record) for seq_record in SeqIO.parse(handle, > "fasta")] > handle.close() > > import pylab > pylab.hist(sizes, bins=20) > pylab.title("%i orchid sequences\nLengths %i to %i" \ > % (len(sizes),min(sizes),max(sizes))) > pylab.xlabel("Sequence length (bp)") > pylab.ylabel("Count") > pylab.show() Its a perfectly fine example, my only comment would be to do something like this: seqs = list(SeqIO.parse(handle, 'fasta')) hist([len(seq) for seq in seqs], bins=20) I like to keep the whole sequences in memory, especially if I am just digging around the data. Also I use the alpha parameter a lot for histograms, especially when doing overlapping ones. So then you can also do something like this: hist([len(seq) for seq in seqs if GC(seq.seq) < .5], bins=20, alpha=. 5, fc='r') hist([len(seq) for seq in seqs if GC(seq.seq) >= .5], bins=20, alpha=. 5, fc='b') > The second is based on the GC% example we used for the BOSC 2008 > presentation: http://biopython.org/DIST/docs/tutorial/images/gc_plot.png > > from Bio import SeqIO > from Bio.SeqUtils import GC > handle = open("ls_orchid.fasta") > gc_values = [GC(seq_record.seq) for seq_record in > SeqIO.parse(handle, "fasta")] > gc_values.sort() > handle.close() > > import pylab > pylab.plot(gc_values) pylab.title("%i orchid sequences\nGC%% %0.1f > to %0.1f" \ > % (len(gc_values),min(gc_values),max(gc_values))) > pylab.xlabel("Genes") > pylab.ylabel("GC%") > pylab.show() Again, if you had all the sequences in a list: plot(sorted(GC(seq.seq) for seq in seqs)) jared From biopython at maubp.freeserve.co.uk Fri Sep 26 09:43:25 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 26 Sep 2008 14:43:25 +0100 Subject: [Biopython-dev] about the SeqRecord and SeqFeature classes In-Reply-To: <200809261518.02453.jblanca@btc.upv.es> References: <200809231440.24684.jblanca@btc.upv.es> <200809251649.34934.jblanca@btc.upv.es> <320fb6e00809260250r66422454g2a5ec665330dd934@mail.gmail.com> <200809261518.02453.jblanca@btc.upv.es> Message-ID: <320fb6e00809260643w1534f237g4b5314b9a884191e@mail.gmail.com> Hi Jose, >> I see you have created a subclass [of] the SeqRecord to add a quality >> property, and made sure this gets sliced too in the __getitem__. This >> is a nice approach (and demonstrates how people could extend the basic >> Biopython objects in their own code). I would also suggest in the >> __init__ method checking that the quality sequence is the same length >> as the sequence itself. > > To do that in a proper way I would like to use property, that's why I was > asking for the possibility of transforming SeqRecord and Seq in new style > classes. Oh I see - then you could put the length check in the property set method? Would you like to file an enhancement bug for transforming SeqRecord and Seq into new style classes, and prepare a patch (for this only)? If this doesn't cause any problems with the unit tests then I don't foresee any problems getting that change made. >> If we were to add something like this to Biopython directly, I prefer >> "quality" over "qual" (just three letters longer but much clearer). > > That's not a problem. I used qual to do it similar to .seq Style is often debatable. Sequence is quite long, and seq is fairly clear. Qual on the other hand could be short for qualifier (a term used in feature annotation). >> I would also consider adding the quality to the Seq object (subclassing >> the Seq object rather than the SeqRecord object). My reasoning is >> that for 454 or Solexa sequencing, you will have thousands of reads >> and all you really care about is the nucleotide sequence and the >> quality scores. Unless you want to give them all unique names, there >> little point having the overhead of the various annotation properties >> of the SeqRecord. > > I didn't subclass Seq because if we want a quality without name we could just > use a tuple or a list. My idea was to create a class with two main > properties, seq and qual (or quality). ... > I agree that this classes should be prepared to deal with a lot of sequences > and they should be efficient. But I don't have the experience to foresee > which model would be better in that regard. I haven't had to deal with 454 or solexa sequence data yet (but I am hoping to in the next six months). Given there are lots of possible implementation/object structure ideas, I think it might be premature to pick one for Biopython right now. Would you be happy with the SeqRecord __getitem__ method (Bug 2507) and creating the subclassed SeqRecord with quality in your own code? If you find that works well in real usage, it would be encouraging for us to use it Biopython. Or have you already been using something like this for serious data analysis? Peter From jblanca at btc.upv.es Fri Sep 26 10:16:00 2008 From: jblanca at btc.upv.es (Jose Blanca) Date: Fri, 26 Sep 2008 16:16:00 +0200 Subject: [Biopython-dev] about the SeqRecord and SeqFeature classes In-Reply-To: <320fb6e00809260643w1534f237g4b5314b9a884191e@mail.gmail.com> References: <200809231440.24684.jblanca@btc.upv.es> <200809261518.02453.jblanca@btc.upv.es> <320fb6e00809260643w1534f237g4b5314b9a884191e@mail.gmail.com> Message-ID: <200809261616.00911.jblanca@btc.upv.es> Hi: > > To do that in a proper way I would like to use property, that's why I was > > asking for the possibility of transforming SeqRecord and Seq in new style > > classes. > > Oh I see - then you could put the length check in the property set method? That's exactly right. > Would you like to file an enhancement bug for transforming SeqRecord > and Seq into new style classes, and prepare a patch (for this only)? > If this doesn't cause any problems with the unit tests then I don't > foresee any problems getting that change made. I will, although first I have to look how to do it. I think that I have to take a look at your developer docs. > Style is often debatable. Sequence is quite long, and seq is fairly > clear. Qual on the other hand could be short for qualifier (a term > used in feature annotation). I see, you've got a point there. > I haven't had to deal with 454 or solexa sequence data yet (but I am > hoping to in the next six months). I'm exactly working on that right now. > Given there are lots of possible > implementation/object structure ideas, I think it might be premature > to pick one for Biopython right now. Would you be happy with the > SeqRecord __getitem__ method (Bug 2507) and creating the subclassed > SeqRecord with quality in your own code? If you find that works well > in real usage, it would be encouraging for us to use it Biopython. That's a great way to do it. > Or > have you already been using something like this for serious data > analysis? Not yet. Best regards, -- Jose M. Blanca Postigo Instituto Universitario de Conservacion y Mejora de la Agrodiversidad Valenciana (COMAV) Universidad Politecnica de Valencia (UPV) Edificio CPI (Ciudad Politecnica de la Innovacion), 8E 46022 Valencia (SPAIN) Tlf.:+34-96-3877000 (ext 88473) From bugzilla-daemon at portal.open-bio.org Fri Sep 26 11:11:36 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 26 Sep 2008 11:11:36 -0400 Subject: [Biopython-dev] [Bug 2507] Adding __getitem__ to SeqRecord for element access and slicing In-Reply-To: Message-ID: <200809261511.m8QFBaOG024019@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2507 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #942 is|0 |1 obsolete| | ------- Comment #11 from biopython-bugzilla at maubp.freeserve.co.uk 2008-09-26 11:11 EST ------- Created an attachment (id=998) --> (http://bugzilla.open-bio.org/attachment.cgi?id=998&action=view) Updated patch to SeqRecord.py and SeqFeature.py This updates the patch to work on the current code in CVS (the new format method has been committed since). This also makes a small but subtle change to checking the end point of each feature to determine if is should be included when generating a sub-record. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Sep 26 11:52:50 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 26 Sep 2008 11:52:50 -0400 Subject: [Biopython-dev] [Bug 2596] New: Add string like strip, rstrip and lstrip methods to the Seq object Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2596 Summary: Add string like strip, rstrip and lstrip methods to the Seq object Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk OtherBugsDependingO 2351 nThis: As part of Bug 2351 to make the Seq object more string like, it would be nice to add strip, rstrip and lstrip methods to the Seq object. The returned Seq object will have the same alphabet as the parent sequence. While for strings defaulting to removing white space (spaces, tabs, newlines) makes sense, for sequences there shouldn't be any white space. I think defaulting to the gap character is more natural here. Possible implementation: def strip(self, chars=None) : """Returns a new Seq object with leading and trailing ends stripped. Optional argument chars defines which characters to remove. If omitted or None (default) the gap character will be used (if defined for the alphabet, otherwise defaulting to "-"). In comparison, the string strip method will default to removing white space.""" if chars is None : try : chars = self.alphabet.gap_char except AttributeError : chars = "-" return Seq(str(self).strip(chars), self.alphabet) def lstrip(self, chars=None) : """Returns a new Seq object with leading (left) end stripped. Optional argument chars defines which characters to remove. If omitted or None (default) the gap character will be used (if defined for the alphabet, otherwise defaulting to "-"). In comparison, the string lstrip method will default to removing white space.""" if chars is None : try : chars = self.alphabet.gap_char except AttributeError : chars = "-" return Seq(str(self).lstrip(chars), self.alphabet) def rstrip(self, chars=None) : """Returns a new Seq object with trailing (right) end stripped. Optional argument chars defines which characters to remove. If omitted or None (default) the gap character will be used (if defined for the alphabet, otherwise defaulting to "-"). In comparison, the string rstrip method will default to removing white space.""" if chars is None : try : chars = self.alphabet.gap_char except AttributeError : chars = "-" return Seq(str(self).rstrip(chars), self.alphabet) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Sep 26 11:52:56 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 26 Sep 2008 11:52:56 -0400 Subject: [Biopython-dev] [Bug 2351] Make Seq more like a string, even subclass string? In-Reply-To: Message-ID: <200809261552.m8QFquJH026306@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2351 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- BugsThisDependsOn| |2596 -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From biopython at maubp.freeserve.co.uk Fri Sep 26 12:11:50 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 26 Sep 2008 17:11:50 +0100 Subject: [Biopython-dev] Sequences and simple plots In-Reply-To: References: <320fb6e00809250915m42350c70xa51007c50c3c95fe@mail.gmail.com> <5321F5EB-F2C1-4D1A-9A67-878C695C0945@northwestern.edu> <320fb6e00809251239i6308d6b9i6a334701ce1cd5f1@mail.gmail.com> <320fb6e00809260315x62634eadw6b0dd17e074bdeb2@mail.gmail.com> Message-ID: <320fb6e00809260911rf91432cp8f89904330550d6b@mail.gmail.com> On Fri, Sep 26, 2008 at 2:40 PM, Jared Flatow wrote: > > On Sep 26, 2008, at 5:15 AM, Peter wrote: > >> Cut and paste for people to comment on directly, > > Ok, cool. > >> The first shows a histogram of sequence lengths in a FASTA file (based >> having recently done this for some real assembly data). Sample output: >> http://biopython.org/DIST/docs/tutorial/images/hist_plot.png >> >> ... > > Its a perfectly fine example, my only comment would be to do something like > this: > > seqs = list(SeqIO.parse(handle, 'fasta')) > hist([len(seq) for seq in seqs], bins=20) > > I like to keep the whole sequences in memory, especially if I am just > digging around the data. I see what you mean - and maybe that is more realistic. One change I'd make is avoiding using seq or seqs are variable names for SeqRecord objects. I've generally tried to use record and records in the documentation. i.e. maybe like this: import pylab from Bio import SeqIO records = list(SeqIO.parse(open("ls_orchid.fasta"), "fasta") #Histogram of lengths pylab.hist([len(record) for records in records], bins=20) pylab.title("%i orchid sequences\nLengths %i to %i" \ % (len(sizes),min(sizes),max(sizes))) pylab.xlabel("Sequence length (bp)") pylab.ylabel("Count") pylab.show() > Also I use the alpha parameter a lot for histograms, especially when > doing overlapping ones. So then you can also do something like this: > > hist([len(seq) for seq in seqs if GC(seq.seq) < .5], bins=20, alpha=.5, > fc='r') > hist([len(seq) for seq in seqs if GC(seq.seq) >= .5], bins=20, alpha=.5, > fc='b') > Fun. I didn't want to get into anything too advanced on the pylab side, rather I wanted to focus on the bioinformatics. Does anyone else think more advanced graphical demonstrations would be worthwhile? >> The second is based on the GC% example we used for the BOSC 2008 >> presentation: http://biopython.org/DIST/docs/tutorial/images/gc_plot.png >> >> ... > > Again, if you had all the sequences in a list: > > plot(sorted(GC(seq.seq) for seq in seqs)) I like the use of sorted here, rather than the two step make a list then sort it. Peter From jflatow at northwestern.edu Fri Sep 26 12:23:14 2008 From: jflatow at northwestern.edu (Jared Flatow) Date: Fri, 26 Sep 2008 11:23:14 -0500 Subject: [Biopython-dev] Sequences and simple plots In-Reply-To: <320fb6e00809260911rf91432cp8f89904330550d6b@mail.gmail.com> References: <320fb6e00809250915m42350c70xa51007c50c3c95fe@mail.gmail.com> <5321F5EB-F2C1-4D1A-9A67-878C695C0945@northwestern.edu> <320fb6e00809251239i6308d6b9i6a334701ce1cd5f1@mail.gmail.com> <320fb6e00809260315x62634eadw6b0dd17e074bdeb2@mail.gmail.com> <320fb6e00809260911rf91432cp8f89904330550d6b@mail.gmail.com> Message-ID: <0ACA5A64-645F-4D1F-AC93-EB23D983C987@northwestern.edu> On Sep 26, 2008, at 11:11 AM, Peter wrote: > I see what you mean - and maybe that is more realistic. > > One change I'd make is avoiding using seq or seqs are variable names > for SeqRecord objects. I've generally tried to use record and > records in > the documentation. Yeah, I agree I was just being lazy. > i.e. maybe like this: > > import pylab > from Bio import SeqIO > records = list(SeqIO.parse(open("ls_orchid.fasta"), "fasta") > > #Histogram of lengths > pylab.hist([len(record) for records in records], bins=20) > pylab.title("%i orchid sequences\nLengths %i to %i" \ > % (len(sizes),min(sizes),max(sizes))) > pylab.xlabel("Sequence length (bp)") > pylab.ylabel("Count") > pylab.show() Except the title no longer works the same...maybe just: pylab.title("Distribution of lengths of %i orchid sequences" % len(records)) ? jared From biopython at maubp.freeserve.co.uk Fri Sep 26 12:28:30 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 26 Sep 2008 17:28:30 +0100 Subject: [Biopython-dev] Sequences and simple plots In-Reply-To: <0ACA5A64-645F-4D1F-AC93-EB23D983C987@northwestern.edu> References: <320fb6e00809250915m42350c70xa51007c50c3c95fe@mail.gmail.com> <5321F5EB-F2C1-4D1A-9A67-878C695C0945@northwestern.edu> <320fb6e00809251239i6308d6b9i6a334701ce1cd5f1@mail.gmail.com> <320fb6e00809260315x62634eadw6b0dd17e074bdeb2@mail.gmail.com> <320fb6e00809260911rf91432cp8f89904330550d6b@mail.gmail.com> <0ACA5A64-645F-4D1F-AC93-EB23D983C987@northwestern.edu> Message-ID: <320fb6e00809260928u4182ee34la768e7fe9f1f7842@mail.gmail.com> >> i.e. maybe like this: >> >> import pylab >> from Bio import SeqIO >> records = list(SeqIO.parse(open("ls_orchid.fasta"), "fasta") >> >> #Histogram of lengths >> pylab.hist([len(record) for records in records], bins=20) >> pylab.title("%i orchid sequences\nLengths %i to %i" \ >> % (len(sizes),min(sizes),max(sizes))) >> pylab.xlabel("Sequence length (bp)") >> pylab.ylabel("Count") >> pylab.show() > > Except the title no longer works the same...maybe just: > > pylab.title("Distribution of lengths of %i orchid sequences" % len(records)) > > ? I spotted that after posting. Whoops. Your suggestion would work, but I'd rather keep the old full title (partly so I don't have to redo the PNG file in CVS and on the website). Did you try the dot-plot example? Did you have any other ideas for things to plot? Peter From bugzilla-daemon at portal.open-bio.org Fri Sep 26 12:59:47 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 26 Sep 2008 12:59:47 -0400 Subject: [Biopython-dev] [Bug 2532] Using IUPAC alphabets in mixed case Seq objects In-Reply-To: Message-ID: <200809261659.m8QGxlhn030037@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2532 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Attachment #972 is|0 |1 obsolete| | ------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2008-09-26 12:59 EST ------- (From update of attachment 972) I think Martin attached this to the wrong bug, see Bug 2547 instead. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Sep 26 13:06:32 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 26 Sep 2008 13:06:32 -0400 Subject: [Biopython-dev] [Bug 2597] New: Enforce alphabet letters in Seq objects Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2597 Summary: Enforce alphabet letters in Seq objects Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: biopython-bugzilla at maubp.freeserve.co.uk BugsThisDependsOn: 2532 If a Seq object is created with an alphabet with a pre-defined set of letters (e.g. the IUPAC alphabets) then I think Biopython should validate that the sequence does indeed only use those letters. This will catch mis-use of ambiguous sequences with non-ambiguous alphabets, letters in an unexpected case, and most importantly any unexpected symbols (e.g. from a parsing problem). This will impose a performance overhead - which can be avoided if the user instead chooses to use a generic dna/rna/protein alphabet which does not list the letters expected. Note that we will have to resolve Bug 2532 before doing this, as currently some parts of Biopython are mis-using the upper case only IUPAC alphabet objects with mixed case sequences. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Sep 26 13:06:34 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 26 Sep 2008 13:06:34 -0400 Subject: [Biopython-dev] [Bug 2532] Using IUPAC alphabets in mixed case Seq objects In-Reply-To: Message-ID: <200809261706.m8QH6YWu030456@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2532 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- OtherBugsDependingO| |2597 nThis| | -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Fri Sep 26 13:13:34 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Fri, 26 Sep 2008 13:13:34 -0400 Subject: [Biopython-dev] [Bug 2532] Using IUPAC alphabets in mixed case Seq objects In-Reply-To: Message-ID: <200809261713.m8QHDYqu030777@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2532 ------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk 2008-09-26 13:13 EST ------- (In reply to comment #0) > Bio.Nexus and Bio.Sequencing.Phd create Seq objects which use these alphabets > even with mixed case sequences. > > This contradicts how I think the alphabet's .letters property is intended > to be used (although currently this is not enforced by the Seq object). I actually identified this issue by making the Seq object check the .letters property as an experiment. I have now filed this as a separate enhancement, Bug 2597. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From jflatow at northwestern.edu Fri Sep 26 14:39:49 2008 From: jflatow at northwestern.edu (Jared Flatow) Date: Fri, 26 Sep 2008 13:39:49 -0500 Subject: [Biopython-dev] Sequences and simple plots In-Reply-To: <320fb6e00809260928u4182ee34la768e7fe9f1f7842@mail.gmail.com> References: <320fb6e00809250915m42350c70xa51007c50c3c95fe@mail.gmail.com> <5321F5EB-F2C1-4D1A-9A67-878C695C0945@northwestern.edu> <320fb6e00809251239i6308d6b9i6a334701ce1cd5f1@mail.gmail.com> <320fb6e00809260315x62634eadw6b0dd17e074bdeb2@mail.gmail.com> <320fb6e00809260911rf91432cp8f89904330550d6b@mail.gmail.com> <0ACA5A64-645F-4D1F-AC93-EB23D983C987@northwestern.edu> <320fb6e00809260928u4182ee34la768e7fe9f1f7842@mail.gmail.com> Message-ID: <52356F04-48AA-454D-A0F6-83E24BBD03EE@northwestern.edu> On Sep 26, 2008, at 11:28 AM, Peter wrote: > Did you try the dot-plot example? I didn't, but it looked good. > Did you have any other ideas for things to plot? Nothing that would be too useful, but just for a demonstration of a scatter plot and putting the different ideas together, it might be nice to do something like: plot([len(rec) for rec in records], [GC(rec.seq) for rec in records], 'o') jared From biopython at maubp.freeserve.co.uk Fri Sep 26 17:29:27 2008 From: biopython at maubp.freeserve.co.uk (Peter) Date: Fri, 26 Sep 2008 22:29:27 +0100 Subject: [Biopython-dev] Sequences and simple plots In-Reply-To: <52356F04-48AA-454D-A0F6-83E24BBD03EE@northwestern.edu> References: <320fb6e00809250915m42350c70xa51007c50c3c95fe@mail.gmail.com> <5321F5EB-F2C1-4D1A-9A67-878C695C0945@northwestern.edu> <320fb6e00809251239i6308d6b9i6a334701ce1cd5f1@mail.gmail.com> <320fb6e00809260315x62634eadw6b0dd17e074bdeb2@mail.gmail.com> <320fb6e00809260911rf91432cp8f89904330550d6b@mail.gmail.com> <0ACA5A64-645F-4D1F-AC93-EB23D983C987@northwestern.edu> <320fb6e00809260928u4182ee34la768e7fe9f1f7842@mail.gmail.com> <52356F04-48AA-454D-A0F6-83E24BBD03EE@northwestern.edu> Message-ID: <320fb6e00809261429i464e0ee8qe81f7090c2141292@mail.gmail.com> On Fri, Sep 26, 2008 at 7:39 PM, Jared Flatow wrote: > On Sep 26, 2008, at 11:28 AM, Peter wrote: > >> Did you try the dot-plot example? > > I didn't, but it looked good. Hopefully I've pitched it right - I've tried to make it as simple as possible, but the nested list comprehension is perhaps non-obvious. >> Did you have any other ideas for things to plot? > > Nothing that would be too useful, but just for a demonstration of a scatter > plot and putting the different ideas together, it might be nice to do > something like: > > plot([len(rec) for rec in records], [GC(rec.seq) for rec in records], 'o') > I had wondered about this but I couldn't see an obvious motivation - plus on the parsing side there is nothing new. How about plotting melting temperature against sequence length (or against the GC%)? This would be more interesting as we'd then also get to show the calculation of another sequence property (using the Bio.SeqUtils.MeltingTemp module). Peter From mjldehoon at yahoo.com Sun Sep 28 07:43:17 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Sun, 28 Sep 2008 04:43:17 -0700 (PDT) Subject: [Biopython-dev] Numeric / NumPy conversion Message-ID: <621531.40325.qm@web62406.mail.re1.yahoo.com> Hi everybody, Since there were no responses on the mailing list asking to maintain the old Numerical Python alongside the new NumPy, I suggest that we proceed towards a NumPy-only release of Biopython. --Michiel. From bugzilla-daemon at portal.open-bio.org Sun Sep 28 20:57:20 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 28 Sep 2008 20:57:20 -0400 Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows file-path values In-Reply-To: Message-ID: <200809290057.m8T0vKw3020416@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2480 ------- Comment #21 from drpatnaik at yahoo.com 2008-09-28 20:57 EST ------- I have tried the latest 'NCBIStandalone.py' file, from CVS (version 1.77). The variable values are as mentioned in comment #16. I no longer get the error from 'os.path.exists'. However, I still get the 'C:/Documents' is not recognized...' error in the error file. By adding a print command to the 'NCBIStandalone.py' file, I can see that the system command being initiated by Python is: "C:/Documents and Settings/patnaik/My Documents/blast/bin/blastall.exe" -p blastn -d "\"C:\Documents and Settings\patnaik\My Documents\blast\bin\hairpin.fa.db\"" -i "C:\Documents and Settings\patnaik\My Documents\blast\bin\30a.seq" -m 7 This command works as such if run outside Python. If I put it directly inside the 'os.popen3' call in the 'NCBIStandalone.py' file, I still get the 'C:/Documents' is not recognized...'. Same happens if I run a Python file with this code: import os my_cmd = r'"C:/Documents and Settings/patnaik/My Documents/blast/bin/blastall.exe" -p blastn -d "\"C:\Documents and Settings\patnaik\My Documents\blast\bin\hairpin.fa.db\"" -i "C:\Documents and Settings\patnaik\My Documents\blast\bin\30a.seq" -m 7' w, r, e = os.popen3(my_cmd) print e.read() It seems that using the 'subprocess' module is the only way around this. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Sun Sep 28 21:31:26 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Sun, 28 Sep 2008 21:31:26 -0400 Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows file-path values In-Reply-To: Message-ID: <200809290131.m8T1VQ9X022896@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2480 ------- Comment #22 from drpatnaik at yahoo.com 2008-09-28 21:31 EST ------- Following up on comment #21 re: subprocess module: I am able to get the local BLAST through Biopython to work if I replace the 'w, r, e = os.popen3(" ".join([blastcmd] + params))' line for 'blastall' in the 'NCBIStandalone.py' file to: import subprocess my_process = subprocess.Popen(" ".join([blastcmd] + params)) w, r, e = (my_process.stdin, my_process.stdout, my_process.stderr) The BLAST results just scroll by on the command-line console application's screen, so this is very crude. I am new to Python, and I hardly know anything about 'subprocess'. Perhaps this will information will help the developers. *** --- C:/Documents and Settings/patnaik/My Documents/Python252/Lib/Site-packages/Bio/Blast/NCBIStandalone.py --- [CVS version 1.77 with the chnages outlined above] --- File C:/Documents and Settings/patnaik/Desktop/test.py --- # My test file my_blast_db =r'"\"C:\Documents and Settings\patnaik\My Documents\blast\bin\mine\""' my_blast_file =r'"C:\Documents and Settings\patnaik\My Documents\blast\bin\hairpin"' my_blast_exe =r'C:/Documents and Settings/patnaik/My Documents/blast/bin/blastall.exe' from Bio.Blast import NCBIStandalone result_handle, error_handle = NCBIStandalone.blastall(my_blast_exe, "blastn", my_blast_db, my_blast_file) error_results = error_handle.read() save_file = open(r"C:/Documents and Settings/patnaik/My Documents/blast/bin/my_blast_error", "w") save_file.write(error_results) save_file.close() result_results = result_handle.read() save_file = open(r"C:/Documents and Settings/patnaik/My Documents/blast/bin/my_blast_result", "w") save_file.write(result_results) save_file.close() --- Run command --- python "C:\Documents and Settings\patnaik\Desktop\test.py" -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Sep 29 05:12:06 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 29 Sep 2008 05:12:06 -0400 Subject: [Biopython-dev] [Bug 2600] New: enhance Seq and SeqRecord to new style classes Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2600 Summary: enhance Seq and SeqRecord to new style classes Product: Biopython Version: 1.48 Platform: All OS/Version: All Status: NEW Severity: normal Priority: P2 Component: Main Distribution AssignedTo: biopython-dev at biopython.org ReportedBy: jblanca at btc.upv.es In some situations it would be quite useful to deal with new style classes. I specially find useful the property method available on the new style classes. I have run the Test with and without this modification and I've found no difference at all. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Sep 29 05:13:45 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 29 Sep 2008 05:13:45 -0400 Subject: [Biopython-dev] [Bug 2600] enhance Seq and SeqRecord to new style classes In-Reply-To: Message-ID: <200809290913.m8T9Dj44021461@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2600 ------- Comment #1 from jblanca at btc.upv.es 2008-09-29 05:13 EST ------- Created an attachment (id=999) --> (http://bugzilla.open-bio.org/attachment.cgi?id=999&action=view) path to transform Seq and SeqRecord into new style classes -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Sep 29 07:30:15 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 29 Sep 2008 07:30:15 -0400 Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows file-path values In-Reply-To: Message-ID: <200809291130.m8TBUFLo029232@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2480 ------- Comment #23 from biopython-bugzilla at maubp.freeserve.co.uk 2008-09-29 07:30 EST ------- Just to confirm: Dealing with spaces in filenames on Windows is horrible, isn't it? There appear to be problems with os.popen3 with a quoted executable name with spaces when there are additional arguments (and for BLAST calls there will always be extra arguments). Switching from os.popen3 to the subprocess module (python 2.4+ only) might help, but spaces are still tricky here. I think the best solution is to get rid of the spaces on Windows. In your case you can't move BLAST, but you can call it via the DOS 8.3 style alternative filename (which won't have any spaces). You'll have to install Mark Hammond's win32 extensions from https://sourceforge.net/projects/pywin32/ to do this, using the win32api.GetShortPathName() function. Right now I suggest you try this in your own code before calling Bio.Blast.NCBIStandalone.blastall() to "fix" the exe name, and if needed the database and input filenames too. Assuming this works nicely, we can put a note in the documentation. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Sep 29 08:00:03 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 29 Sep 2008 08:00:03 -0400 Subject: [Biopython-dev] [Bug 2596] Add string like split, strip, rstrip and lstrip methods to the Seq object In-Reply-To: Message-ID: <200809291200.m8TC039u030491@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2596 biopython-bugzilla at maubp.freeserve.co.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|Add string like strip, |Add string like split, |rstrip and lstrip methods to|strip, rstrip and lstrip |the Seq object |methods to the Seq object ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-09-29 08:00 EST ------- Adding split onto this bug as discussed on the mailing list. See Bug 2351 comment 15. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Sep 29 08:01:37 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 29 Sep 2008 08:01:37 -0400 Subject: [Biopython-dev] [Bug 2596] Add string like split, strip, rstrip and lstrip methods to the Seq object In-Reply-To: Message-ID: <200809291201.m8TC1boE030672@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2596 ------- Comment #2 from biopython-bugzilla at maubp.freeserve.co.uk 2008-09-29 08:01 EST ------- Created an attachment (id=1000) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1000&action=view) Patch to Bio/Seq.py for Seq object split, strip, lstrip and rstrip methods As discussed on the mailing lists, this differs from the previous suggestions by following the string defaults (split or strip using white space characters). -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Sep 29 08:02:55 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 29 Sep 2008 08:02:55 -0400 Subject: [Biopython-dev] [Bug 2351] Make Seq more like a string, even subclass string? In-Reply-To: Message-ID: <200809291202.m8TC2sJM030759@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2351 ------- Comment #16 from biopython-bugzilla at maubp.freeserve.co.uk 2008-09-29 08:02 EST ------- (In reply to comment #15) > This is a suggested implementation of the split method for our Seq object, > modelled after the python string method which it calls internall. Note that I > have made the separator non-optional on the grounds that the string method's > default of white space isn't (usually) sensible for sequences. I'm happy to > change this if people this its better to be as close as possible to the string > method. > > def split(self, sep, maxsplit=None) : > """Split method, like that of a python string. > > Return a list of the 'words' in the string (as Seq objects), > using sep as the delimiter string. If maxsplit is given, at > most maxsplit splits are done. > > Unlike the python string method, sep must be specified (as > there shouldn't be any whitespace strings in a sequence). > > e.g. print my_seq.split("-") > """ > if maxsplit : > parts = self.data.split(sep, maxsplit) > else : > parts = self.data.split(sep) > return [Seq(chunk, self.alphabet) for chunk in parts] > After some debate on the mailing list, following the python string method defaults is probably preferable for consistency (even if we don't expect any white space in a Seq object's sequence). I have extended Bug 2596 to cover the split method in addition to the strip methods, and uploaded a revised patch there. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Sep 29 08:35:34 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 29 Sep 2008 08:35:34 -0400 Subject: [Biopython-dev] [Bug 2601] New: Seq find() method: proposal Message-ID: http://bugzilla.open-bio.org/show_bug.cgi?id=2601 Summary: Seq find() method: proposal Product: Biopython Version: Not Applicable Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: BioSQL AssignedTo: biopython-dev at biopython.org ReportedBy: lpritc at scri.sari.ac.uk A find() method for the Seq object was recently proposed on the mailing list. I have extended Seq locally to include a find method that uses the re module and the reverse_complement function from Bio.Seq, and is described below. In the original implementation, the search was meant to be called from the parent SeqRecord object, which populated itself with features describing the search results. I'm proposing this as a potential starting point for the implementation of a Seq.find() method. Note that the loop of re.search() calls was necessary to obtain the set of overlapping matches, as re.finditer() only returns non-overlapping matches. The two functions searching in forward-only and reverse-only directions could probably be combined, and behaviour distinguished on keyword, for neater code. #### def find_regexes(self, pattern): """ find_regexes(self, pattern) pattern String, regular expression to search for Finds all occurrences of the passed regular expression in the sequence, and returns a list of tuples in the format: (start, end, match, strand). If the sequence is a nucleotide sequence, the reverse strand is also searched """ # Find forward matches match_locations = [(hit.start()+1, hit.end(), \ self.data[hit.start():hit.end()], 1) \ for hit in self.__find_overlapping_regexes(pattern)] # If the sequence is a nucleotide sequence, look on the reverse # strand, too if self.alphabet.__class__ in [Alphabet.DNAAlphabet, Alphabet.RNAAlphabet, IUPAC.ExtendedIUPACDNA, IUPAC.IUPACAmbiguousDNA, IUPAC.IUPACUnambiguousDNA, IUPAC.IUPACAmbiguousRNA, IUPAC.IUPACUnambiguousRNA]: rev_locations = [(hit.start()+1, hit.end(), \ self.data[hit.start():hit.end()], 1) \ for hit in \ self.__find_overlapping_regexes_rev(pattern)] match_locations += rev_locations match_locations.sort() return match_locations def __find_overlapping_regexes(self, pattern): """ Finds all overlapping regexes matching the passed pattern in the sequence, and returns a list of re.SRE_Match objects describing them. """ hits = [] pos = 0 regex = re.compile(pattern) while pos < len(self.data): hit = regex.search(self.data, pos=pos) if hit is None: break hits.append(hit) pos = hit.start()+1 return hits def __find_overlapping_regexes_rev(self, pattern): """ Finds all overlapping regexes matching the passed pattern in the sequence, and returns a list of re.SRE_Match objects describing them, as hits positioned in the forward direction - i.e. start and end read in the forward sense. """ hits = [] pos = 0 regex = re.compile(reverse_complement(Seq(pattern, self.alphabet))) while pos < len(self.data): hit = regex.search(self.data, pos=pos) if hit is None: break hits.append(hit) pos = hit.start()+1 return hits -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Sep 29 08:36:09 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 29 Sep 2008 08:36:09 -0400 Subject: [Biopython-dev] [Bug 2601] Seq find() method: proposal In-Reply-To: Message-ID: <200809291236.m8TCa9Qk032562@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2601 lpritc at scri.sari.ac.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Component|BioSQL |Main Distribution -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Sep 29 09:24:13 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 29 Sep 2008 09:24:13 -0400 Subject: [Biopython-dev] [Bug 2601] Seq find() method: proposal In-Reply-To: Message-ID: <200809291324.m8TDODXq002611@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2601 ------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk 2008-09-29 09:24 EST ------- Note that any Seq.find() method should be as like the string find method as possible for consistency. One enhancement is that it might be worth checking the search string is valid against the Seq object's alphabet (see also Bug 2597). However, reserving Seq.find() for this string find like behaviour doesn't stop us adding more advanced regular expression based methods. P.S. To determine if a sequence has a nucleotide alphabet, use the fact that any well defined nucleotide alphabet object should be a subclass of Bio.Alphabet.NucleotideAlphabet() rather than checking a predefined list. However, there is no way of knowing if the sequence is double stranded or single sided, so personally I don't like the way your suggested function automatically searches the reverse complement strand too. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Sep 29 10:34:03 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 29 Sep 2008 10:34:03 -0400 Subject: [Biopython-dev] [Bug 2601] Seq find() method: proposal In-Reply-To: Message-ID: <200809291434.m8TEY3VE007082@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2601 bsouthey at gmail.com changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |bsouthey at gmail.com ------- Comment #2 from bsouthey at gmail.com 2008-09-29 10:34 EST ------- (In reply to comment #1) I do think that any general function involving regular expressions should conform to the Python re module. The reasoning follows Peter's point that a user should not have to convert the Seq object into a Python string. While I see the point of the reverse complement and overlapping matches, these are inconsistent with re module. So I think it would be more valuable to implement specific methods from the re modules. In this case, the functions should accept regular expression. I also do not see the gain for the reverse complement because this is just another pattern. Also it is potentially confusing because the direction is not immediately apparent without further computation. In this case I think that 'explicit is better than implicit' (The Zen of Python) so I think the decision to use the reverse complement must come prior to the use of this method. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Sep 29 12:01:51 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 29 Sep 2008 12:01:51 -0400 Subject: [Biopython-dev] [Bug 2475] BioSQL.Loader should reuse existing taxon entries in lineage In-Reply-To: Message-ID: <200809291601.m8TG1ppa013194@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2475 ------- Comment #34 from biopython-bugzilla at maubp.freeserve.co.uk 2008-09-29 12:01 EST ------- (In reply to comment #33) > Also, yes, the _get_taxon_id() function is getting far too long, and should > probably be restructured as part of this bug. In BioSQL/Loader.py CVS revision 1.34 I have split the _get_taxon_id() method in two - ready to look at integrating Eric's code for fetching the NCBI taxonomy on demand. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Sep 29 12:41:10 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 29 Sep 2008 12:41:10 -0400 Subject: [Biopython-dev] [Bug 2601] Seq find() method: proposal In-Reply-To: Message-ID: <200809291641.m8TGfA2F015768@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2601 ------- Comment #3 from lpritc at scri.sari.ac.uk 2008-09-29 12:41 EST ------- Make a cup of tea, this is a long one... ;) Peter: >Note that any Seq.find() method should be as like the string find method as >possible for consistency. Bruce: > While I > see the point of the reverse complement and overlapping matches, these are > inconsistent with re module I see your points, but I'm not /entirely/ in agreement, here. While I think that it is nearly always a good thing that the input arguments and returned results match those that are expected for same-named functions in similar classes, I think that we may still take the opportunity to implement useful behaviour that is relevant to biological sequences where the intent doesn't stray too far from what you'd expect for a string. For example, the ability to accommodate ambiguous alphabets or regular expressions - not part of string.find() - would be useful. I think that this approach implements additional functionality of which the string.find() method's functionality is a subset, and so could be implemented without breaking the apparent identical operation of string.find() and Seq.find(). This would facilitate the use of string-specific third-party modules that could be useful for analysis of biological sequences, while extending functionality. Where I begin to disagree is on whether it is always desirable to constrain the behaviour of these functions for the sake of consistency with other modules, while still taking time to make them behave differently at all, rather than just implementing that exact same behaviour, and handling the biologically-useful stuff in a different method altogether. I like the idea of making Seq.py more string-like, in part because when I first started using Biopython, I missed being able to slice, and other conveniently string-y things. By way of contrast: string.find() has the behaviour of only returning a single match - that which is closest to the string start. This might be useful to some (in ORF-finding, perhaps), but I expect I would use a finditer() method that returned all matches (for which there is no equivalent string method) almost exclusively, if available. I expect that I could cope quite happily with find() doing different things on pure strings and on Seq objects, but I'd be OK with a nonstandard finditer() alongside a 100% string-compatible find() as an alternative to this, though I'd want finditer() to return overlapping matches. Such overlapping matches, however, do not match re.finditer() behaviour. But, in this case, the re method's behaviour is constrained for good reasons related to regular expression implementation, and not reasons related to biological good sense. I think that there is sufficient reason not to be consistent here, and instead to return biologically-useful overlapping matches. The core of my argument here is that we're not just working with strings, but with string representations of biological objects; that's exactly why we have this specialised library, and don't just use strings in the first place. I think that there will be occasions when we should break some syntactic expectations, where it is appropriate for the problem domain, and that this *might* (note equivocation) be one of them. Peter: >One enhancement is that it might be worth checking >the search string is valid against the Seq object's alphabet (see also Bug >2597). Good point. In the implementation I put up here, if there are any invalid characters then the string just won't be found, which may be overgenerous to user error ;) Raising a ValueError or some such to let the user know that the search alphabet wasn't valid would be very helpful. Peter: > To determine if a sequence has a nucleotide alphabet, use the fact that > any well defined nucleotide alphabet object should be a subclass of > Bio.Alphabet.NucleotideAlphabet() rather than checking a predefined list. Fair enough - I didn't know that NucleotideAlphabet existed... I got as far up the hierarchy as DNAAlphabet and RNAAlphabet, and stopped at working code ;) Peter: > However, there is no way of knowing if the sequence is double stranded or > single sided, so personally I don't like the way your suggested function > automatically searches the reverse complement strand too. It just suited my purpose at the time. Whether or not the nucleotide sequence is single- or double-stranded, people might still want to search for a complementary sequence; e.g. microarray/PCR/siRNA probes, etc. The method as written reports the strand on which the match can be found, and the user is free to discard results as they see fit, which again suited me at the time. A 'strand' argument to the method of 'forward', 'reverse', or 'both', or just assuming 'both' if not specified would be better, I agree. What drove my implementation above was that, while nucleotide sequence matches may or may not be of interest in either direction, reverse matches to protein sequences are definitely (AFAIAC) not that interesting ;) Bruce: >I do think that any general function involving regular expressions should > conform to the Python re module. The reasoning follows Peter's point that a > user should not have to convert the Seq object into a Python string. I don't think I understand this point. Would you prefer an re.search() like implementation that takes a Seq object as its query argument? I don't think I'd find that as useful, myself, as a method that just takes a string. Such a method could also maybe parse arguments so as to compile the regex from the Seq.data attribute though, fulfilling your requirement. I used regular expression based searching in my implementation for speed, and strictly speaking a string is also a regular expression, even if it doesn't have special characters - I didn't see any inconsistency there. My docstring is maybe a bit misleading about that but, when I wrote it, it wasn't intended for anyone but me to use. Sorry about that. Also, I disagree regarding conformance to the re module, particularly as our use of re is likely to be less general than the re module itself - see above. > So I think it would be more valuable to implement > specific methods from the re modules. In this case, the functions should accept > regular expression. I would quite like to have a 'true' regular expression search method myself, with wildcards for nucleotide symbols, but this would have to be implemented differently to my attempt above: e.g., for proper reverse complement searches, you'd have to reverse complement the wildcards as well as ambiguity codes. > I also do not see the gain for the reverse complement because this is just > another pattern. The gain was that I needed matches to my patterns of interest on the sequence in either direction, and I only cared which strand they lay on for reasons of locating them. Reverse complementing the query is usually quicker than reverse complementing the genome on which you search. Assuming you're searching on a genome, of course ;) > Also it is potentially confusing because the direction is not > immediately apparent without further computation. I'm not sure I understand you: in teh above code, the method returns the strand on which the match is found, along with all the other data. The computation required to handle this is the same as that to find the start and end points: parse an integer from the tuple. I'm not intending that the return type should be set in stone and, as I mentioned, it was just a handy step in the creation of SeqFeatures in the parent SeqRecord. > In this case I think that > 'explicit is better than implicit' (The Zen of Python) so I think the decision > to use the reverse complement must come prior to the use of this method. In the spirit of quoted arguments from authority: "A foolish consistency is the hobgoblin of little minds" (Python Style Guide) ;) -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Mon Sep 29 16:47:50 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Mon, 29 Sep 2008 16:47:50 -0400 Subject: [Biopython-dev] [Bug 2601] Seq find() method: proposal In-Reply-To: Message-ID: <200809292047.m8TKlokn000682@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2601 ------- Comment #4 from bsouthey at gmail.com 2008-09-29 16:47 EST ------- (In reply to comment #3) > Make a cup of tea, this is a long one... ;) > > Peter: > >Note that any Seq.find() method should be as like the string find method as > >possible for consistency. > Bruce: > > While I > > see the point of the reverse complement and overlapping matches, these are > > inconsistent with re module > > I see your points, but I'm not /entirely/ in agreement, here. Good, as where is the fun otherwise? :-) > that it is nearly always a good thing that the input arguments and returned > results match those that are expected for same-named functions in similar > classes, I think that we may still take the opportunity to implement useful > behaviour that is relevant to biological sequences where the intent doesn't > stray too far from what you'd expect for a string. For example, the ability to > accommodate ambiguous alphabets or regular expressions - not part of > string.find() - would be useful. I think that this approach implements > additional functionality of which the string.find() method's functionality is a > subset, and so could be implemented without breaking the apparent identical > operation of string.find() and Seq.find(). This would facilitate the use of > string-specific third-party modules that could be useful for analysis of > biological sequences, while extending functionality. > > Where I begin to disagree is on whether it is always desirable to constrain the > behaviour of these functions for the sake of consistency with other modules, > while still taking time to make them behave differently at all, rather than > just implementing that exact same behaviour, and handling the > biologically-useful stuff in a different method altogether. I like the idea of > making Seq.py more string-like, in part because when I first started using > Biopython, I missed being able to slice, and other conveniently string-y > things. Okay, so what is still missing with these new changes? > > By way of contrast: > string.find() has the behaviour of only returning a single match - that which > is closest to the string start. This might be useful to some (in ORF-finding, > perhaps), but I expect I would use a finditer() method that returned all > matches (for which there is no equivalent string method) almost exclusively, if > available. I expect that I could cope quite happily with find() doing > different things on pure strings and on Seq objects, but I'd be OK with a > nonstandard finditer() alongside a 100% string-compatible find() as an > alternative to this, though I'd want finditer() to return overlapping matches. > It is not correct to compare finditer (a re method) to find (a string method) or for that matter re.match or re.search. (I do notice a confusion between these similar but different functions but there are numerous web pages that discuss when one or the other should be used.) I do understand the interest but there two different points that you raised in this bug. First is finding one match (such as re.search or re.match) and finding all matches (such as re.findall or re.finditer). I fully agree with having these. This is the second point that I definitely think that the user has to decide whether or not they want overlapping matches not the developer. There is no option under this implementation. > Such overlapping matches, however, do not match re.finditer() behaviour. But, > in this case, the re method's behaviour is constrained for good reasons related > to regular expression implementation, and not reasons related to biological > good sense. I think that there is sufficient reason not to be consistent here, > and instead to return biologically-useful overlapping matches. I am not for or against having an method that returns overlapping matches rather I am against only having returning overlapping matches as the only choice. > > The core of my argument here is that we're not just working with strings, but > with string representations of biological objects; that's exactly why we have > this specialised library, and don't just use strings in the first place. I > think that there will be occasions when we should break some syntactic > expectations, where it is appropriate for the problem domain, and that this > *might* (note equivocation) be one of them. > > Peter: > >One enhancement is that it might be worth checking > >the search string is valid against the Seq object's alphabet (see also Bug > >2597). > > Good point. In the implementation I put up here, if there are any invalid > characters then the string just won't be found, which may be overgenerous to > user error ;) Raising a ValueError or some such to let the user know that the > search alphabet wasn't valid would be very helpful. > > Peter: > > To determine if a sequence has a nucleotide alphabet, use the fact that > > any well defined nucleotide alphabet object should be a subclass of > > Bio.Alphabet.NucleotideAlphabet() rather than checking a predefined list. > > Fair enough - I didn't know that NucleotideAlphabet existed... I got as far up > the hierarchy as DNAAlphabet and RNAAlphabet, and stopped at working code ;) > > Peter: > > However, there is no way of knowing if the sequence is double stranded or > > single sided, so personally I don't like the way your suggested function > > automatically searches the reverse complement strand too. > > It just suited my purpose at the time. Whether or not the nucleotide sequence > is single- or double-stranded, people might still want to search for a > complementary sequence; e.g. microarray/PCR/siRNA probes, etc. The method as > written reports the strand on which the match can be found, and the user is > free to discard results as they see fit, which again suited me at the time. A > 'strand' argument to the method of 'forward', 'reverse', or 'both', or just > assuming 'both' if not specified would be better, I agree. > > What drove my implementation above was that, while nucleotide sequence matches > may or may not be of interest in either direction, reverse matches to protein > sequences are definitely (AFAIAC) not that interesting ;) > > Bruce: > >I do think that any general function involving regular expressions should > > conform to the Python re module. The reasoning follows Peter's point that a > > user should not have to convert the Seq object into a Python string. > > I don't think I understand this point. Would you prefer an re.search() like > implementation that takes a Seq object as its query argument? I don't think > I'd find that as useful, myself, as a method that just takes a string. Such a > method could also maybe parse arguments so as to compile the regex from the > Seq.data attribute though, fulfilling your requirement. What I mean is that a user should be able to either specify the pattern or specify a regular expression object. In either case the optional flags that are often useful to have like ignorecase are ignored. > > I used regular expression based searching in my implementation for speed, and > strictly speaking a string is also a regular expression, even if it doesn't > have special characters - I didn't see any inconsistency there. My docstring > is maybe a bit misleading about that but, when I wrote it, it wasn't intended > for anyone but me to use. Sorry about that. > > Also, I disagree regarding conformance to the re module, particularly as our > use of re is likely to be less general than the re module itself - see above. > > > So I think it would be more valuable to implement > > specific methods from the re modules. In this case, the functions should accept > > regular expression. > > I would quite like to have a 'true' regular expression search method myself, > with wildcards for nucleotide symbols, but this would have to be implemented > differently to my attempt above: e.g., for proper reverse complement searches, > you'd have to reverse complement the wildcards as well as ambiguity codes. > > > I also do not see the gain for the reverse complement because this is just > > another pattern. > > The gain was that I needed matches to my patterns of interest on the sequence > in either direction, and I only cared which strand they lay on for reasons of > locating them. Reverse complementing the query is usually quicker than reverse > complementing the genome on which you search. Assuming you're searching on a > genome, of course ;) > > > Also it is potentially confusing because the direction is not > > immediately apparent without further computation. > > I'm not sure I understand you: in teh above code, the method returns the strand > on which the match is found, along with all the other data. The computation > required to handle this is the same as that to find the start and end points: > parse an integer from the tuple. I'm not intending that the return type should > be set in stone and, as I mentioned, it was just a handy step in the creation > of SeqFeatures in the parent SeqRecord. Regardless of what a user actually wants, they must wait for two searches along the sequence. After that finishes the user must examine each and every entry (due to the match_locations.sort()) to find the strand regardless of what they want to do. I do not any advantage in this than someone calling the function twice to get match_locations and rev_locations, doing 'match_locations += rev_locations' and match_locations.sort(). > > > In this case I think that > > 'explicit is better than implicit' (The Zen of Python) so I think the decision > > to use the reverse complement must come prior to the use of this method. > > In the spirit of quoted arguments from authority: "A foolish consistency is the > hobgoblin of little minds" (Python Style Guide) ;) > Okay, then more Zen: "In the face of ambiguity, refuse the temptation to guess." -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Sep 30 01:53:02 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 30 Sep 2008 01:53:02 -0400 Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows file-path values In-Reply-To: Message-ID: <200809300553.m8U5r2OB000429@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2480 ------- Comment #24 from drpatnaik at yahoo.com 2008-09-30 01:53 EST ------- [I myself do not need to work on a Windows machine, and I am following this bug out of curiosity.] I have tried the pywin32-short-path-name approach. It seems I have to use win32api.GetShortPathName on the values for the BLAST exe as well as the database and the input file. If I don't do so, the process seems to hang: I see only a blinking cursor for 2-3 minutes when I just quit the console application (I know the BLAST process itself takes less than a second to finish). The error file has '^C' and the result file is empty. But because the specified database ('mine') is really not a file, GetShortPathName fails, unless I use code like: my_blast_db = win32api.GetShortPathName('C:/Documents and Settings/patnaik/My Documents/blast/bin/mine.nin')[:-4] Even then I see the hang, and get a similar as before [or empty] error file. With a print command in NCBIStandalone.py, I can see that the value being passed on to the system is: C:/DOCUME~1/patnaik/MYDOCU~1/blast/bin/blastall.exe -p blastn -d C:/DOCUME~1/patnaik/MYDOCU~1/blast/bin/HAIRPI~1 -i C:/DOCUME~1/patnaik/MYDOCU~1/blast/bin/30a.seq -m 7 This value is right as it works by itself. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Sep 30 03:45:14 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 30 Sep 2008 03:45:14 -0400 Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows file-path values In-Reply-To: Message-ID: <200809300745.m8U7jEQ3008574@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2480 ------- Comment #25 from drpatnaik at yahoo.com 2008-09-30 03:45 EST ------- Using subprocess I am now able to get Biopython to run local BLAST successfully in Windows when spaces are present in file-path values. With a conditional statement like the one below, following type of modification will still let Biopython remain compatible with old versions of Python that cannot use subprocess: # replace lines 1680-1682 of CVS 1.78 of Bio/Blast/NCBIStandalone.py with these try: import subprocess my_process = subprocess.Popen(" ".join([blastcmd] + params), stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=False) r, e = my_process.communicate('through stdin to stdout') return r, e except: w, r, e = os.popen3(" ".join([blastcmd] + params)) w.close() return File.UndoHandle(r), File.UndoHandle(e) The test.py file I tested: # Note the unusual ways to specify the database, input file, and BLAST locations my_blast_db = r'"\"C:\Documents and Settings\patnaik\My Documents\blast\bin\hairpin.db\""' my_blast_file = r'"C:\Documents and Settings\patnaik\My Documents\blast\bin\30a.seq"' my_blast_exe = r"C:\Documents and Settings\patnaik\My Documents\blast\bin\blastall.exe" from Bio.Blast import NCBIStandalone my_blast_result, my_blast_error = NCBIStandalone.blastall(my_blast_exe, "blastn", my_blast_db, my_blast_file) # Note the way the save_file path is specified save_file = open(r'C:/Documents and Settings/patnaik/My Documents/blast/bin/my_blast_error', "w") save_file.write(my_blast_error) save_file.close() save_file = open(r'C:/Documents and Settings/patnaik/My Documents/blast/bin/my_blast_result', "w") save_file.write(my_blast_result) save_file.close() -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Sep 30 08:49:45 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 30 Sep 2008 08:49:45 -0400 Subject: [Biopython-dev] [Bug 2551] Adding advanced __getitem__ to generic alignment, e.g. align[1:2, 5:-5] In-Reply-To: Message-ID: <200809301249.m8UCnjXI001499@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2551 ------- Comment #1 from jblanca at btc.upv.es 2008-09-30 08:49 EST ------- Created an attachment (id=1001) --> (http://bugzilla.open-bio.org/attachment.cgi?id=1001&action=view) new Alignment implementation example Implementation of an Aligment-like class (here named Assembly) capable of covering the cases proposed by Peter and also capable of holding sequences that does not start and end at the same location. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Sep 30 08:55:26 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 30 Sep 2008 08:55:26 -0400 Subject: [Biopython-dev] [Bug 2551] Adding advanced __getitem__ to generic alignment, e.g. align[1:2, 5:-5] In-Reply-To: Message-ID: <200809301255.m8UCtQ9R001967@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2551 ------- Comment #2 from jblanca at btc.upv.es 2008-09-30 08:55 EST ------- In the comment #1 I present a class that could be easily adapted to be compatible with the BioPython Alignment API but with some extra capabilities. It can hold sequences that start and end at different places (like the EST assemblies). It also can have a consensus, although that's a minor improvement. And it can hold in the rows any sequence-like class like Seq, str, list or tuple. This would be, I hope, quite future proof, we could also add Quality, SeqWithQuality or whatever. It doesn't deal with the alphabet, I think that a subclass should be created to add that capability. I haven't add alphabets to this class to keep the compatibility with all the sequence-like objects that have no alphabet. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Sep 30 10:32:45 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 30 Sep 2008 10:32:45 -0400 Subject: [Biopython-dev] [Bug 2475] BioSQL.Loader should reuse existing taxon entries in lineage In-Reply-To: Message-ID: <200809301432.m8UEWjtq009707@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2475 ------- Comment #35 from biopython-bugzilla at maubp.freeserve.co.uk 2008-09-30 10:32 EST ------- BioSQL/BioSeqDatabase.py revision 1.18 and BioSQL/Loader.py revision 1.35 in CVS include what I think is a working version of the BioSQL loader which can fetch taxonomy from the NCBI via Bio.Entrez. This is based in part on Eric's code but includes several additional features (e.g. recording the genetic code which the NCBI provides with the taxonomy data). When the NCBI fetching is disabled, but an NCBI taxon ID is known, only a minimal taxonomy record is recorded (without the lineage). This can then be completed by running the BioSQL load_ncbi_taxonomy.pl script. There is still scope for improvement, e.g. * _get_taxon_id_from_ncbi_lineage doesn't really need to be recursive. * When there is no NCBI taxon ID present in the SeqRecord this code will not attempt to search for the taxonomy based on the species name. I'm not sure if doing this search is a good idea or not... * We could make an Entrez.efetch call for each row added to the table (rather than as currently just one call per lineage) which should allow us to fetch the genetic code for all the entries. On balance I think this is not needed, and can be populated by the BioSQL load_ncbi_taxonomy.pl script anyway. This has passed the unit tests and my own initial testing, and I intend to use this code a lot more this week/next week. However, it would be great to have some additional testing of this as is. -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Sep 30 11:34:37 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 30 Sep 2008 11:34:37 -0400 Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows file-path values In-Reply-To: Message-ID: <200809301534.m8UFYbEP014418@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2480 ------- Comment #26 from biopython-bugzilla at maubp.freeserve.co.uk 2008-09-30 11:34 EST ------- (In reply to comment #25) > Using subprocess I am now able to get Biopython to run local > BLAST successfully in Windows when spaces are present in > file-path values. Good - but have you been able to try your code on Linux or the Mac? > With a conditional statement like the one below, following > type of modification will still let Biopython remain > compatible with old versions of Python that cannot use > subprocess: If we can take advantage of the subprocess module in a cross platform way, then yes, a try/except fall back for python 2.3 would be nice. As of Blast/NCBIStandalone.py CVS revision 1.79, there is now only one place in this module where such a change can be applied (rather than three places). > # replace lines 1680-1682 of CVS 1.78 of Bio/Blast/NCBIStandalone.py with > these > > try: > import subprocess > my_process = subprocess.Popen(" ".join([blastcmd] + params), > stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE, > shell=False) Using shell=False works while shell=True fails on Windows (I tested on Windows XP with Python 2.5 from IDLE). However, the opposite is true on Mac OS X with python 2.5 from IDLE. This is a pain. Also you don't need the stdin=... argument as we don't want to give BLAST any piped input. > r, e = my_process.communicate('through stdin to stdout') > return r, e First of all, there is no reason to pipe in the text "through stdin to stdout" into BLAST's standard input. I guess you blindly cut and paste this from a google search. Instead just: r, e = my_process.communicate() You should NOT be using the communicate method, as it will read in an buffer all the output and wait for BLAST to finish. As BLAST output (especially XML output) can be larger (gigabytes) we must not load this into memory. Instead: r = my_process.stdout e = my_process.stderr > except: > w, r, e = os.popen3(" ".join([blastcmd] + params)) > w.close() > return File.UndoHandle(r), File.UndoHandle(e) > > The test.py file I tested: > > # Note the unusual ways to specify the database, input file, and BLAST > locations > my_blast_db = r'"\"C:\Documents and Settings\patnaik\My > Documents\blast\bin\hairpin.db\""' > my_blast_file = r'"C:\Documents and Settings\patnaik\My > Documents\blast\bin\30a.seq"' > my_blast_exe = r"C:\Documents and Settings\patnaik\My > Documents\blast\bin\blastall.exe" I've been having trouble with specifying BLAST databases with spaces in the path. Have you been able to demonstrate this with more than one database? -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From bugzilla-daemon at portal.open-bio.org Tue Sep 30 18:19:22 2008 From: bugzilla-daemon at portal.open-bio.org (bugzilla-daemon at portal.open-bio.org) Date: Tue, 30 Sep 2008 18:19:22 -0400 Subject: [Biopython-dev] [Bug 2480] Local BLAST fails: Spaces in Windows file-path values In-Reply-To: Message-ID: <200809302219.m8UMJMa8016035@portal.open-bio.org> http://bugzilla.open-bio.org/show_bug.cgi?id=2480 ------- Comment #27 from drpatnaik at yahoo.com 2008-09-30 18:19 EST ------- As I mentioned earlier, I am new to Python, and my usage of the subprocess is indeed imperfect. I tried the subprocess routine through a test.py file on a Mac OS X 10.5.5 with Python 2.5.2, but w/o using Biopython. I had to use 'shell=True', otherwise with 'shell=False',I get: File "/Lab/Laboratory/Libs/Python/lib/python2.5/subprocess.py", line 594, in __init__ errread, errwrite) File "/Lab/Laboratory/Libs/Python/lib/python2.5/subprocess.py", line 1091, in _execute_child raise child_exception With 'shell=True', it works even when there is a space in the file-path/names of the BLAST executable, the database or the input sequence file (the escaping of the spaces needs to be properly done). --- test.py --- # Note escaping of the space characters vary for my_blast_cmd = r'"/Lab/Laboratory/Libs/NCBI_blast/bin/Change loc/blastall" -p blastn -d "\"/Lab/Laboratory/Libs/NCBI_blast/data/Change loc/My db\"" -i /Lab/Laboratory/Libs/NCBI_blast/data/My\ seq.txt -m 7' import subprocess my_process = subprocess.Popen(my_blast_cmd, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True) my_blast_result, my_blast_error = my_process.communicate('through stdin to stdout') save_file = open('/Users/patnaik/Desktop/my_blast_error', "w") save_file.write(my_blast_error) save_file.close() save_file = open('/Users/patnaik/Desktop/my_blast_result', "w") save_file.write(my_blast_result) save_file.close() -- Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From mjldehoon at yahoo.com Tue Sep 30 20:24:18 2008 From: mjldehoon at yahoo.com (Michiel de Hoon) Date: Tue, 30 Sep 2008 17:24:18 -0700 (PDT) Subject: [Biopython-dev] Numpy conversion In-Reply-To: <37659.57326.qm@web62402.mail.re1.yahoo.com> Message-ID: <228132.43778.qm@web62402.mail.re1.yahoo.com> Bio.kNN is the only module that imports Bio.distance. Bio.distance is written in Python, but it also imports a C version of Bio.distance if it is available. From the comments in the code, I gather that the purpose of the C-version is to get fast distance calculations without using Numeric / NumPy. However, Bio.kNN itself uses Numeric / NumPy, which defeats the purpose of the C-version of Bio.distance. I would therefore like to propose to add a NumPy-aware version of the code in Bio.distance to Bio.kNN, and to deprecate Bio.distance. Any objections? --Michiel. --- On Thu, 9/18/08, Michiel de Hoon wrote: > From: Michiel de Hoon > Subject: Re: [Biopython-dev] Numpy conversion > To: "Peter" > Cc: biopython-dev at biopython.org > Date: Thursday, September 18, 2008, 10:10 AM > > I've not used it myself, but it sounds handy. > Michiel, > > does this overlap at all with your clustering module? > > No, it doesn't. Bio.Cluster contains unsupervised > clustering methods only. The k-nearest neighbors in Bio.kNN > is a supervised learning method. > > --Michiel. > > --- On Wed, 9/17/08, Peter > wrote: > > > From: Peter > > Subject: Re: [Biopython-dev] Numpy conversion > > To: mjldehoon at yahoo.com > > Cc: biopython-dev at biopython.org > > Date: Wednesday, September 17, 2008, 10:29 AM > > On Wed, Sep 17, 2008 at 3:13 PM, Michiel de Hoon > > wrote: > > > Hi everybody, > > > > > > I am now looking at the pure-python modules that > make > > use of Numerical Python / NumPy. > > > Bio.kNN is one of them; this also happens to be > the > > only module that imports Bio.distance, > > > which also depends on NumPy. > > > > > > What I am not sure about is the usage of Bio.kNN. > A > > quick google search didn't reveal much, > > > suggesting that it is not widely used. Bio.kNN > > currently is not documented in the tutorial, but > > > the code itself is reasonably well documented. > > > > > > How do you guys feel about this module? Should we > keep > > it? > > > > > > > I've not used it myself, but it sounds handy. > Michiel, > > does this > > overlap at all with your clustering module? > > > > Peter > > > > _______________________________________________ > Biopython-dev mailing list > Biopython-dev at lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/biopython-dev