Reading from unix pipes

From Biopython
(Difference between revisions)
Jump to: navigation, search
(Created page)
 
(Use Bio.SeqIO.convert function)
 
(4 intermediate revisions by 3 users not shown)
Line 2: Line 2:
  
 
There are many circumstances when reading data from a Unix pipe is preferable to reading data from a file. One example is reading sequences from a compressed file, which is often preferable to uncompressing the file and then reading from it.
 
There are many circumstances when reading data from a Unix pipe is preferable to reading data from a file. One example is reading sequences from a compressed file, which is often preferable to uncompressing the file and then reading from it.
 +
 +
You can also use pipes at the Windows command line, but this isn't quite as flexible.
  
 
== Solution ==
 
== Solution ==
  
This example script reads a solexa/illumina fastq from stdin, converts the data to sanger fastq and writes it to stdout.
+
This example script uses [[SeqIO|Bio.SeqIO]] to read a Solexa/Illumina FASTQ from stdin, converts the data to Sanger FASTQ (using PHRED scores) and writes it to stdout. See this more general page on [[Converting_sequence_files|converting sequence files]] for some background.
  
 
<python>
 
<python>
 
import sys
 
import sys
 
from Bio import SeqIO
 
from Bio import SeqIO
 
+
SeqIO.convert(sys.stdin, "fastq-solexa", sys.stdout, "fastq")
recs = SeqIO.parse(sys.stdin, "fastq-solexa")
+
SeqIO.write(recs, sys.stdout, "fastq")
+
 
</python>
 
</python>
  
The following bash command can be used to extract the compressed sequence and pipe it to the script (solexa2sanger_fq.py).
+
Pipes are a feature of the command line that enable the stdout output of a program or command to be directed to the stdin input of another command or program.  For example, the following shell command can be used to extract the compressed sequence and pipe it to the script (solexa2sanger_fq.py).
  
 
<bash>
 
<bash>
Line 21: Line 21:
 
</bash>
 
</bash>
  
This will write the sequence in sanger fastq format to stdout - in this case the screen.
+
This will write the sequence in Sanger FASTQ format to stdout - in this case the screen.
 +
 
 +
Redirection is similar to using pipes, but instead of directing the output of one program to the input of another, redirection redirects the contents of a file to a program's stdin, and/or the output of a program's stdout to a file. In this example, the python script is fed its data redirected from an input file, and the output which would have been printed to screen is instead redirected to an output file:
 +
 
 +
<bash>
 +
python solexa2sanger_fq.py < some_solexa.fastq > some_phred.fastq
 +
</bash>
 +
 
 +
Redirection can also be used to redirect a program or command's stderr to a file.  Further examples of using redirection can be found [http://tldp.org/HOWTO/Bash-Prog-Intro-HOWTO-3.html here].
  
 
[[Category:Cookbook]]
 
[[Category:Cookbook]]

Latest revision as of 08:32, 9 October 2009

Problem

There are many circumstances when reading data from a Unix pipe is preferable to reading data from a file. One example is reading sequences from a compressed file, which is often preferable to uncompressing the file and then reading from it.

You can also use pipes at the Windows command line, but this isn't quite as flexible.

Solution

This example script uses Bio.SeqIO to read a Solexa/Illumina FASTQ from stdin, converts the data to Sanger FASTQ (using PHRED scores) and writes it to stdout. See this more general page on converting sequence files for some background.

import sys
from Bio import SeqIO
SeqIO.convert(sys.stdin, "fastq-solexa", sys.stdout, "fastq")

Pipes are a feature of the command line that enable the stdout output of a program or command to be directed to the stdin input of another command or program. For example, the following shell command can be used to extract the compressed sequence and pipe it to the script (solexa2sanger_fq.py).

gunzip -c some_solexa.fastq.gz | python solexa2sanger_fq.py

This will write the sequence in Sanger FASTQ format to stdout - in this case the screen.

Redirection is similar to using pipes, but instead of directing the output of one program to the input of another, redirection redirects the contents of a file to a program's stdin, and/or the output of a program's stdout to a file. In this example, the python script is fed its data redirected from an input file, and the output which would have been printed to screen is instead redirected to an output file:

python solexa2sanger_fq.py < some_solexa.fastq > some_phred.fastq

Redirection can also be used to redirect a program or command's stderr to a file. Further examples of using redirection can be found here.

Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox