Package BioSQL :: Module Loader :: Class DatabaseLoader
[hide private]
[frames] | no frames]

Class DatabaseLoader

source code

Object used to load SeqRecord objects into a BioSQL database.

Instance Methods [hide private]
 
__init__(self, adaptor, dbid, fetch_NCBI_taxonomy=False)
Initialize with connection information for the database.
source code
 
load_seqrecord(self, record)
Load a Biopython SeqRecord into the database.
source code
 
_get_ontology_id(self, name, definition=None)
Returns the identifier for the named ontology (PRIVATE).
source code
 
_get_term_id(self, name, ontology_id=None, definition=None, identifier=None)
Get the id that corresponds to a term (PRIVATE).
source code
 
_add_dbxref(self, dbname, accession, version)
Insert a dbxref and return its id.
source code
 
_get_taxon_id(self, record)
Get the taxon id for this record (PRIVATE).
source code
 
_fix_name_class(self, entrez_name)
Map Entrez name terms to those used in taxdump (PRIVATE).
source code
 
_get_taxon_id_from_ncbi_taxon_id(self, ncbi_taxon_id, scientific_name=None, common_name=None)
Get the taxon id for this record from the NCBI taxon ID (PRIVATE).
source code
 
_get_taxon_id_from_ncbi_lineage(self, taxonomic_lineage)
This is recursive! (PRIVATE).
source code
 
_load_bioentry_table(self, record)
Fill the bioentry table with sequence information (PRIVATE).
source code
 
_load_bioentry_date(self, record, bioentry_id)
Add the effective date of the entry into the database.
source code
 
_load_biosequence(self, record, bioentry_id)
Record a SeqRecord's sequence and alphabet in the database (PRIVATE).
source code
 
_load_comment(self, record, bioentry_id)
Record a SeqRecord's annotated comment in the database (PRIVATE).
source code
 
_load_annotations(self, record, bioentry_id)
Record a SeqRecord's misc annotations in the database (PRIVATE).
source code
 
_load_reference(self, reference, rank, bioentry_id)
Record a SeqRecord's annotated references in the database (PRIVATE).
source code
 
_load_seqfeature(self, feature, feature_rank, bioentry_id)
Load a biopython SeqFeature into the database (PRIVATE).
source code
 
_load_seqfeature_basic(self, feature_type, feature_rank, bioentry_id)
Load the first tables of a seqfeature and returns the id (PRIVATE).
source code
 
_load_seqfeature_locations(self, feature, seqfeature_id)
Load all of the locations for a SeqFeature into tables (PRIVATE).
source code
 
_insert_location(self, location, rank, seqfeature_id)
Add a location of a SeqFeature to the seqfeature_location table (PRIVATE).
source code
 
_load_seqfeature_qualifiers(self, qualifiers, seqfeature_id)
Insert the (key, value) pair qualifiers relating to a feature (PRIVATE).
source code
 
_load_seqfeature_dbxref(self, dbxrefs, seqfeature_id)
Add database crossreferences of a SeqFeature to the database (PRIVATE).
source code
Int

_get_dbxref_id(self, db, accession)
o db String, the name of the external database containing the accession number
source code
 
_get_seqfeature_dbxref(self, seqfeature_id, dbxref_id, rank)
Check for a pre-existing seqfeature_dbxref entry with the passed seqfeature_id and dbxref_id.
source code
 
_add_seqfeature_dbxref(self, seqfeature_id, dbxref_id, rank)
Insert a seqfeature_dbxref row and return the seqfeature_id and...
source code
 
_load_dbxrefs(self, record, bioentry_id)
Load any sequence level cross references into the database (PRIVATE).
source code
 
_get_bioentry_dbxref(self, bioentry_id, dbxref_id, rank)
Check for a pre-existing bioentry_dbxref entry with the passed seqfeature_id and dbxref_id.
source code
 
_add_bioentry_dbxref(self, bioentry_id, dbxref_id, rank)
Insert a bioentry_dbxref row and return the seqfeature_id and...
source code
Method Details [hide private]

__init__(self, adaptor, dbid, fetch_NCBI_taxonomy=False)
(Constructor)

source code 
Initialize with connection information for the database.

Creating a DatabaseLoader object is normally handled via the
BioSeqDatabase DBServer object, for example:

from BioSQL import BioSeqDatabase
server = BioSeqDatabase.open_database(driver="MySQLdb", user="gbrowse",
                 passwd = "biosql", host = "localhost", db="test_biosql")
try:
    db = server["test"]
except KeyError:
    db = server.new_database("test", description="For testing GBrowse")

_get_ontology_id(self, name, definition=None)

source code 
Returns the identifier for the named ontology (PRIVATE).

This looks through the onotology table for a the given entry name.
If it is not found, a row is added for this ontology (using the
definition if supplied).  In either case, the id corresponding to
the provided name is returned, so that you can reference it in
another table.

_get_term_id(self, name, ontology_id=None, definition=None, identifier=None)

source code 
Get the id that corresponds to a term (PRIVATE).

This looks through the term table for a the given term. If it
is not found, a new id corresponding to this term is created.
In either case, the id corresponding to that term is returned, so
that you can reference it in another table.

The ontology_id should be used to disambiguate the term.

_get_taxon_id(self, record)

source code 
Get the taxon id for this record (PRIVATE).

record - a SeqRecord object

This searches the taxon/taxon_name tables using the
NCBI taxon ID, scientific name and common name to find
the matching taxon table entry's id.

If the species isn't in the taxon table, and we have at
least the NCBI taxon ID, scientific name or common name,
at least a minimal stub entry is created in the table.

Returns the taxon id (database key for the taxon table,
not an NCBI taxon ID), or None if the taxonomy information
is missing.

See also the BioSQL script load_ncbi_taxonomy.pl which
will populate and update the taxon/taxon_name tables
with the latest information from the NCBI.

_fix_name_class(self, entrez_name)

source code 
Map Entrez name terms to those used in taxdump (PRIVATE).

We need to make this conversion to match the taxon_name.name_class
values used by the BioSQL load_ncbi_taxonomy.pl script.

e.g.
"ScientificName" -> "scientific name",
"EquivalentName" -> "equivalent name",
"Synonym" -> "synonym",

_get_taxon_id_from_ncbi_taxon_id(self, ncbi_taxon_id, scientific_name=None, common_name=None)

source code 
Get the taxon id for this record from the NCBI taxon ID (PRIVATE).

ncbi_taxon_id - string containing an NCBI taxon id
scientific_name - string, used if a stub entry is recorded
common_name - string, used if a stub entry is recorded

This searches the taxon table using ONLY the NCBI taxon ID
to find the matching taxon table entry's ID (database key).

If the species isn't in the taxon table, and the fetch_NCBI_taxonomy
flag is true, Biopython will attempt to go online using Bio.Entrez
to fetch the official NCBI lineage, recursing up the tree until an
existing entry is found in the database or the full lineage has been
fetched.

Otherwise the NCBI taxon ID, scientific name and common name are
recorded as a minimal stub entry in the taxon and taxon_name tables.
Any partial information about the lineage from the SeqRecord is NOT
recorded.  This should mean that (re)running the BioSQL script
load_ncbi_taxonomy.pl can fill in the taxonomy lineage.

Returns the taxon id (database key for the taxon table, not
an NCBI taxon ID).

_get_taxon_id_from_ncbi_lineage(self, taxonomic_lineage)

source code 
This is recursive! (PRIVATE).

taxonomic_lineage - list of taxonomy dictionaries from Bio.Entrez

First dictionary in list is the taxonomy root, highest would be the species.
Each dictionary includes:
- TaxID (string, NCBI taxon id)
- Rank (string, e.g. "species", "genus", ..., "phylum", ...)
- ScientificName (string)
(and that is all at the time of writing)

This method will record all the lineage given, returning the taxon id
(database key, not NCBI taxon id) of the final entry (the species).

_load_bioentry_table(self, record)

source code 
Fill the bioentry table with sequence information (PRIVATE).

record - SeqRecord object to add to the database.

_load_bioentry_date(self, record, bioentry_id)

source code 
Add the effective date of the entry into the database.

record - a SeqRecord object with an annotated date
bioentry_id - corresponding database identifier

_load_biosequence(self, record, bioentry_id)

source code 
Record a SeqRecord's sequence and alphabet in the database (PRIVATE).

record - a SeqRecord object with a seq property
bioentry_id - corresponding database identifier

_load_comment(self, record, bioentry_id)

source code 
Record a SeqRecord's annotated comment in the database (PRIVATE).

record - a SeqRecord object with an annotated comment
bioentry_id - corresponding database identifier

_load_annotations(self, record, bioentry_id)

source code 
Record a SeqRecord's misc annotations in the database (PRIVATE).

The annotation strings are recorded in the bioentry_qualifier_value
table, except for special cases like the reference, comment and
taxonomy which are handled with their own tables.

record - a SeqRecord object with an annotations dictionary
bioentry_id - corresponding database identifier

_load_reference(self, reference, rank, bioentry_id)

source code 
Record a SeqRecord's annotated references in the database (PRIVATE).

record - a SeqRecord object with annotated references
bioentry_id - corresponding database identifier

_load_seqfeature_basic(self, feature_type, feature_rank, bioentry_id)

source code 
Load the first tables of a seqfeature and returns the id (PRIVATE).

This loads the "key" of the seqfeature (ie. CDS, gene) and
the basic seqfeature table itself.

_load_seqfeature_locations(self, feature, seqfeature_id)

source code 
Load all of the locations for a SeqFeature into tables (PRIVATE).

This adds the locations related to the SeqFeature into the
seqfeature_location table. Fuzzies are not handled right now.
For a simple location, ie (1..2), we have a single table row
with seq_start = 1, seq_end = 2, location_rank = 1.

For split locations, ie (1..2, 3..4, 5..6) we would have three
row tables with:
    start = 1, end = 2, rank = 1
    start = 3, end = 4, rank = 2
    start = 5, end = 6, rank = 3

_insert_location(self, location, rank, seqfeature_id)

source code 
Add a location of a SeqFeature to the seqfeature_location table (PRIVATE).

TODO - Add location operator to location_qualifier_value?

_load_seqfeature_qualifiers(self, qualifiers, seqfeature_id)

source code 
Insert the (key, value) pair qualifiers relating to a feature (PRIVATE).

Qualifiers should be a dictionary of the form:
    {key : [value1, value2]}

_load_seqfeature_dbxref(self, dbxrefs, seqfeature_id)

source code 
Add database crossreferences of a SeqFeature to the database (PRIVATE).

o dbxrefs           List, dbxref data from the source file in the
                    format <database>:<accession>

o seqfeature_id     Int, the identifier for the seqfeature in the
                    seqfeature table

Insert dbxref qualifier data for a seqfeature into the
seqfeature_dbxref and, if required, dbxref tables.
The dbxref_id qualifier/value sets go into the dbxref table
as dbname, accession, version tuples, with dbxref.dbxref_id
being automatically assigned, and into the seqfeature_dbxref
table as seqfeature_id, dbxref_id, and rank tuples

_get_dbxref_id(self, db, accession)

source code 
o db          String, the name of the external database containing
              the accession number

o accession   String, the accession of the dbxref data

Finds and returns the dbxref_id for the passed data.  The method
attempts to find an existing record first, and inserts the data
if there is no record.

Returns:
Int

_get_seqfeature_dbxref(self, seqfeature_id, dbxref_id, rank)

source code 
Check for a pre-existing seqfeature_dbxref entry with the passed
seqfeature_id and dbxref_id.  If one does not exist, insert new
data

_add_seqfeature_dbxref(self, seqfeature_id, dbxref_id, rank)

source code 
Insert a seqfeature_dbxref row and return the seqfeature_id and
dbxref_id

_load_dbxrefs(self, record, bioentry_id)

source code 
Load any sequence level cross references into the database (PRIVATE).

See table bioentry_dbxref.

_get_bioentry_dbxref(self, bioentry_id, dbxref_id, rank)

source code 
Check for a pre-existing bioentry_dbxref entry with the passed
seqfeature_id and dbxref_id.  If one does not exist, insert new
data

_add_bioentry_dbxref(self, bioentry_id, dbxref_id, rank)

source code 
Insert a bioentry_dbxref row and return the seqfeature_id and
dbxref_id