Package Bio :: Module bgzf :: Class BgzfReader
[hide private]
[frames] | no frames]

Class BgzfReader

source code

object --+
         |
        BgzfReader

BGZF reader, acts like a read only handle but seek/tell differ.

Let's use the BgzfBlocks function to have a peak at the BGZF blocks in an example BAM file,

>>> try:
...     from __builtin__ import open # Python 2
... except ImportError:
...     from builtins import open # Python 3
...
>>> handle = open("SamBam/ex1.bam", "rb")
>>> for values in BgzfBlocks(handle):
...     print("Raw start %i, raw length %i; data start %i, data length %i" % values)
Raw start 0, raw length 18239; data start 0, data length 65536
Raw start 18239, raw length 18223; data start 65536, data length 65536
Raw start 36462, raw length 18017; data start 131072, data length 65536
Raw start 54479, raw length 17342; data start 196608, data length 65536
Raw start 71821, raw length 17715; data start 262144, data length 65536
Raw start 89536, raw length 17728; data start 327680, data length 65536
Raw start 107264, raw length 17292; data start 393216, data length 63398
Raw start 124556, raw length 28; data start 456614, data length 0
>>> handle.close()

Now let's see how to use this block information to jump to specific parts of the decompressed BAM file:

>>> handle = BgzfReader("SamBam/ex1.bam", "rb")
>>> assert 0 == handle.tell()
>>> magic = handle.read(4)
>>> assert 4 == handle.tell()

So far nothing so strange, we got the magic marker used at the start of a decompressed BAM file, and the handle position makes sense. Now however, let's jump to the end of this block and 4 bytes into the next block by reading 65536 bytes,

>>> data = handle.read(65536)
>>> len(data)
65536
>>> assert 1195311108 == handle.tell()

Expecting 4 + 65536 = 65540 were you? Well this is a BGZF 64-bit virtual offset, which means:

>>> split_virtual_offset(1195311108)
(18239, 4)

You should spot 18239 as the start of the second BGZF block, while the 4 is the offset into this block. See also make_virtual_offset,

>>> make_virtual_offset(18239, 4)
1195311108

Let's jump back to almost the start of the file,

>>> make_virtual_offset(0, 2)
2
>>> handle.seek(2)
2
>>> handle.close()

Note that you can use the max_cache argument to limit the number of BGZF blocks cached in memory. The default is 100, and since each block can be up to 64kb, the default cache could take up to 6MB of RAM. The cache is not important for reading through the file in one pass, but is important for improving performance of random access.

Instance Methods [hide private]
 
__init__(self, filename=None, mode='r', fileobj=None, max_cache=100)
x.__init__(...) initializes x; see x.__class__.__doc__ for signature
source code
 
_load_block(self, start_offset=None) source code
 
tell(self)
Returns a 64-bit unsigned BGZF virtual offset.
source code
 
seek(self, virtual_offset)
Seek to a 64-bit unsigned BGZF virtual offset.
source code
 
read(self, size=-1) source code
 
readline(self) source code
 
__next__(self) source code
 
next(self)
Python 2 style alias for Python 3 style __next__ method.
source code
 
__iter__(self) source code
 
close(self) source code
 
seekable(self) source code
 
isatty(self) source code
 
fileno(self) source code
 
__enter__(self) source code
 
__exit__(self, type, value, traceback) source code

Inherited from object: __delattr__, __format__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __sizeof__, __str__, __subclasshook__

Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

__init__(self, filename=None, mode='r', fileobj=None, max_cache=100)
(Constructor)

source code 
x.__init__(...) initializes x; see x.__class__.__doc__ for signature

Overrides: object.__init__
(inherited documentation)