Bio :: File :: _IndexedSeqFileDict :: Class _IndexedSeqFileDict
[hide private]
[frames] | no frames]

Class _IndexedSeqFileDict

source code

UserDict.DictMixin --+
                     |
                    _IndexedSeqFileDict
Known Subclasses:

Read only dictionary interface to a sequential record file.

This code is used in both Bio.SeqIO for indexing as SeqRecord
objects, and in Bio.SearchIO for indexing QueryResult objects.

Keeps the keys and associated file offsets in memory, reads the file
to access entries as objects parsing them on demand. This approach
is memory limited, but will work even with millions of records.

Note duplicate keys are not allowed. If this happens, a ValueError
exception is raised.

As used in Bio.SeqIO, by default the SeqRecord's id string is used
as the dictionary key. In Bio.SearchIO, the query's id string is
used. This can be changed by suppling an optional key_function,
a callback function which will be given the record id and must
return the desired key. For example, this allows you to parse
NCBI style FASTA identifiers, and extract the GI number to use
as the dictionary key.

Note that this dictionary is essentially read only. You cannot
add or change values, pop values, nor clear the dictionary.

Instance Methods [hide private]
 
__init__(self, random_access_proxy, key_function, repr, obj_repr) source code
 
__repr__(self) source code
 
__str__(self) source code
 
__contains__(self, key) source code
 
__len__(self)
How many records are there?
source code
 
itervalues(self)
Iterate over the SeqRecord) items.
source code
 
iteritems(self)
Iterate over the (key, SeqRecord) items.
source code
 
iterkeys(self)
Iterate over the keys.
source code
 
items(self)
Would be a list of the (key, SeqRecord) tuples, but not implemented.
source code
 
values(self)
Would be a list of the SeqRecord objects, but not implemented.
source code
 
keys(self)
Return a list of all the keys (SeqRecord identifiers).
source code
 
__iter__(self)
Iterate over the keys.
source code
 
__getitem__(x, y)
x[y]
source code
D[k] if k in D, else d

get(D, k, d=...)
d defaults to None.
source code
 
get_raw(self, key)
Similar to the get method, but returns the record as a raw string.
source code
 
__setitem__(self, key, value)
Would allow setting or replacing records, but not implemented.
source code
 
update(self, *args, **kwargs)
Would allow adding more values, but not implemented.
source code
 
pop(self, key, default=None)
Would remove specified record, but not implemented.
source code
 
popitem(self)
Would remove and return a SeqRecord, but not implemented.
source code
 
clear(self)
Would clear dictionary, but not implemented.
source code
 
fromkeys(self, keys, value=None)
A dictionary method which we don't implement.
source code
 
copy(self)
A dictionary method which we don't implement.
source code

Inherited from UserDict.DictMixin: __cmp__, has_key, setdefault

Method Details [hide private]

__repr__(self)
(Representation operator)

source code 
Overrides: UserDict.DictMixin.__repr__

__contains__(self, key)
(In operator)

source code 
Overrides: UserDict.DictMixin.__contains__

__len__(self)
(Length operator)

source code 
How many records are there?

Overrides: UserDict.DictMixin.__len__

itervalues(self)

source code 
Iterate over the SeqRecord) items.

Overrides: UserDict.DictMixin.itervalues

iteritems(self)

source code 
Iterate over the (key, SeqRecord) items.

Overrides: UserDict.DictMixin.iteritems

iterkeys(self)

source code 
Iterate over the keys.

Overrides: UserDict.DictMixin.iterkeys

items(self)

source code 
Would be a list of the (key, SeqRecord) tuples, but not implemented.

In general you can be indexing very very large files, with millions
of sequences. Loading all these into memory at once as SeqRecord
objects would (probably) use up all the RAM. Therefore we simply
don't support this dictionary method.

Overrides: UserDict.DictMixin.items

values(self)

source code 
Would be a list of the SeqRecord objects, but not implemented.

In general you can be indexing very very large files, with millions
of sequences. Loading all these into memory at once as SeqRecord
objects would (probably) use up all the RAM. Therefore we simply
don't support this dictionary method.

Overrides: UserDict.DictMixin.values

__iter__(self)

source code 
Iterate over the keys.

Overrides: UserDict.DictMixin.__iter__

get(D, k, d=...)

source code 
d defaults to None.

Returns:
D[k] if k in D, else d

Overrides: UserDict.DictMixin.get

get_raw(self, key)

source code 
Similar to the get method, but returns the record as a raw string.

If the key is not found, a KeyError exception is raised.

Note that on Python 3 a bytes string is returned, not a typical
unicode string.

NOTE - This functionality is not supported for every file format.

update(self, *args, **kwargs)

source code 
Would allow adding more values, but not implemented.

Overrides: UserDict.DictMixin.update

pop(self, key, default=None)

source code 
Would remove specified record, but not implemented.

Overrides: UserDict.DictMixin.pop

popitem(self)

source code 
Would remove and return a SeqRecord, but not implemented.

Overrides: UserDict.DictMixin.popitem

clear(self)

source code 
Would clear dictionary, but not implemented.

Overrides: UserDict.DictMixin.clear