Reading from unix pipes
Giles.weaver (Talk | contribs) (Created page) |
(Formatting, and a redirection example) |
||
| Line 2: | Line 2: | ||
There are many circumstances when reading data from a Unix pipe is preferable to reading data from a file. One example is reading sequences from a compressed file, which is often preferable to uncompressing the file and then reading from it. | There are many circumstances when reading data from a Unix pipe is preferable to reading data from a file. One example is reading sequences from a compressed file, which is often preferable to uncompressing the file and then reading from it. | ||
| + | |||
| + | You can also use pipes at the Windows command line, but this isn't quite as flexible. | ||
== Solution == | == Solution == | ||
| − | This example script reads a | + | This example script reads a Solexa/Illumina FASTQ from stdin, converts the data to Sanger FASTQ (using PHRED scores) and writes it to stdout. |
<python> | <python> | ||
| Line 21: | Line 23: | ||
</bash> | </bash> | ||
| − | This will write the sequence in | + | This will write the sequence in Sanger FASTQ format to stdout - in this case the screen. |
| + | |||
| + | You might can also use it like this, where the python script is fed its data from an input file, and the output which would have been printed to screen is instead sent to an output file: | ||
| + | |||
| + | <bash> | ||
| + | python solexa2sanger_fq.py < some_solexa.fastq > some_phred.fastq | ||
| + | </bash> | ||
| + | |||
[[Category:Cookbook]] | [[Category:Cookbook]] | ||
Revision as of 12:21, 5 June 2009
Problem
There are many circumstances when reading data from a Unix pipe is preferable to reading data from a file. One example is reading sequences from a compressed file, which is often preferable to uncompressing the file and then reading from it.
You can also use pipes at the Windows command line, but this isn't quite as flexible.
Solution
This example script reads a Solexa/Illumina FASTQ from stdin, converts the data to Sanger FASTQ (using PHRED scores) and writes it to stdout.
import sys from Bio import SeqIO recs = SeqIO.parse(sys.stdin, "fastq-solexa") SeqIO.write(recs, sys.stdout, "fastq")
The following bash command can be used to extract the compressed sequence and pipe it to the script (solexa2sanger_fq.py).
gunzip -c some_solexa.fastq.gz | python solexa2sanger_fq.py
This will write the sequence in Sanger FASTQ format to stdout - in this case the screen.
You might can also use it like this, where the python script is fed its data from an input file, and the output which would have been printed to screen is instead sent to an output file:
python solexa2sanger_fq.py < some_solexa.fastq > some_phred.fastq