Biowulf at the NIH
RSS Feed
SOAP3 on Biowulf

SOAP3 is a GPU-based software for aligning short reads with a reference sequence.It can find all alignments with k mismatches, where k is chosen from 0 to 3. When compared with its previous version SOAP2, SOAP3 can be up to tens of times faster. For example, when aligning length-100 reads with the human genome, SOAP3 is the first software that can find all 3-mismatch alignments in tens of seconds per one million reads.

The alignment program in this package is optimized to work for multi-millions of short reads each time by running a multi-core CPU and the GPU concurrently.

To exploit the parallelism of the GPU effectively, SOAP3 is using an adapted version of the 2BWT index of SOAP2 (the new index is called the GPU-2BWT). The index and algorithms were developed by the algorithms research group of the University of Hong Kong (T.W. Lam, C.M. Liu, Thomas Wong, Edward Wu and S.M. Yiu). Please remember to cite their effort:

Citation:
- SOAP3 Publication
- To reference SOAPsnp, please cite this paper : Ruiqiang Li, Yingrui Li, Xiaodong Fang, et al. (2009) "SNP detection for massively parallel whole-genome resequencing" (2009) Genome Res. , doi:10.1101/gr.088013.108
- To reference SOAP 2, please cite this paper : Li et al. (2009) SOAP2: an improved ultrafast tool for short read alignment. BIOINFORMATICS, doi:10.1093/bioinformatics/btp336
- To reference SOAP v1 program, please cite this paper : Li et al. (2008) SOAP: short oligonucleotide alignment program" . BIOINFORMATICS, 24 no.5,713

GPU NODES ONLY - SOAP3 _ONLY_ run on GPU nodes of biowulf cluster.

INDEX FILES - We can create shared index files for users when requested. Currently hg18, hg19 and mm9 index files for soap3 are located under /fdb/soap3.

SOAP package includes many programs, see /usr/local/apps/soap :

To run the SOAP2 packages, see the Biowulf SOAP page.

Running a single SOAP3 job on Biowulf

1. See http://www.cs.hku.hk/2bwt-tools/soap3-dp/ for instructions.

2. Create a directory for your job, /data/user/soap/run1 for example.

3. Put your reference file(ref1.fa for example) and input file(s) with reads (infile1.fa and infile2.fa for example) under this directory.

4. Copy the initialization file /usr/local/soap/soap3/*.ini to this directory and modify them for the parameters you want.

5. Set up a batch script along the following lines:

#!/bin/bash
#PBS -m be
# this file is soap3_script

module load soap3

cd /data/user/soap/run1
soap3-dp-builder ref1.fa
BGS-Build ref1.fa.index
soap3-dp pair ref1.fa.index infile1.fa infile2.fa -u 500 -v 200
merge-succinct.sh infile1.fa

Submit this job to the batch system with the command:
biowulf% qsub -l nodes=1:gpu2050 soap3_script

Running an interactive SOAP3 job

It may be useful for debugging purposes to run SOAP3 jobs interactively. Such jobs should not be run on the Biowulf login node. Instead allocate an interactive node as described below, and run the interactive job there.

biowulf% qsub -I -l nodes=1:gpu2050
qsub: waiting for job 2236960.biobos to start
qsub: job 2236960.biobos ready

[user@p1234]$ module load soap3
[user@p1234]$ cd /data/user/soap/run1
[user@p1234]$ soap3-builder ref1.fa
[user@p1234]$ BGS-Build ref1.fa.index
[user@p1234]$ soap3-dp pair ref1.fa.index infile1.fa infile2.fa -u 500 -v 200
[user@p1234]$ make-view-succinct.sh infile1.fa

[user@p1234] exit
qsub: job 2236960.biobos completed
[user@biowulf ~]$ 

Make sure to exit the node once you have finished your run.

Documentation

SOAP3
Other SOAP package