Ace contig class
(start of a page describing contig class) |
(Now with introduction) |
||
| Line 1: | Line 1: | ||
| − | This page describes the '''Contig''' class used in '''Bio.Sequencing.Ace''' to hold all of the | + | This page describes the '''Contig''' class used in '''Bio.Sequencing.Ace''' to hold all of the information about a single contig record in an Ace file. |
| − | + | ||
| + | A contig is a set of overlapping sequences used to generate a consensus sequence for a given region of a genome. Ace files are usually used to store contigs (that is a consensus sequence and the DNA sequences that are used to generate it) created with [http://www.phrap.org/phredphrapconsed.html phrap] or as part of [http://en.wikipedia.org/wiki/454_Life_Sciences 454 sequencing projects]. Biopython's ace contig class follows the the structure ace file format closely so if you are not familiar with these files (a description of the format can be found under the heading "ACE FILE FORMAT" in the [http://bozeman.mbt.washington.edu/consed/distributions/README.14.0.txt consed readme]) it might represent a steep learning curve - the diagram below gives an overview of how some of the most important information is stored. | ||
| Line 9: | Line 9: | ||
== Examples == | == Examples == | ||
<python> | <python> | ||
| − | #Start | + | #Start by parsing a file to get some contigs |
>>>from Bio.Sequencing import Ace | >>>from Bio.Sequencing import Ace | ||
>>>ace_gen = Ace.parse(open("contig1.ace", 'r')) # from Tests/Ace/contig1.ace | >>>ace_gen = Ace.parse(open("contig1.ace", 'r')) # from Tests/Ace/contig1.ace | ||
Revision as of 23:57, 30 June 2009
This page describes the Contig class used in Bio.Sequencing.Ace to hold all of the information about a single contig record in an Ace file.
A contig is a set of overlapping sequences used to generate a consensus sequence for a given region of a genome. Ace files are usually used to store contigs (that is a consensus sequence and the DNA sequences that are used to generate it) created with phrap or as part of 454 sequencing projects. Biopython's ace contig class follows the the structure ace file format closely so if you are not familiar with these files (a description of the format can be found under the heading "ACE FILE FORMAT" in the consed readme) it might represent a steep learning curve - the diagram below gives an overview of how some of the most important information is stored.
Summary Diagram
Examples
#Start by parsing a file to get some contigs >>>from Bio.Sequencing import Ace >>>ace_gen = Ace.parse(open("contig1.ace", 'r')) # from Tests/Ace/contig1.ace >>>contig = ace_gen.next() # just the consensus sequence >>>contig.sequence 'aatacgGGATTGCCCTAGTAACGGCGAGTGAAGCGGCAACAGCTCAAATTTG......' # assembler designated quality for each base in the consensus >>>contig.quality [0, 0, 0, 0, 0, 0, 22, 23, 25, 28, 34, 47,...] # Ace files also contain information on the reads that make up the consensus. # Information about the reads are in two lists, "reads" and "af" # so, to get read sequence for every read in a contig # and that reads position in its contig >>>for i in range(len(contig.reads)): ... contig.af[i].padded_start ... contig.reads[i].rd.sequence ... 'tagcgaggaaagaacccaacaGg...' -14 'aatacgGGATTGCCCTagtaacG...' 1
