Biowulf at the NIH
RSS Feed
SSAHA2 on Biowulf

SSAHA2 (Sequence Search and Alignment by Hashing Algorithm) is a pairwise sequence alignment program designed for the efficient mapping of sequencing reads onto genomic reference sequences. It was developed at the Wellcome Trust Sanger Institute, UK.

SSAHA2 reads of most sequencing platforms (ABI-Sanger, Roche 454, Illumina-Solexa) and a range of output formats (SAM, CIGAR, PSL etc.) are supported. A pile-up pipeline for analysis and genotype calling is available as a separate package.

The environment variable(s) need to be set properly first. The easiest way to do this is by using the modules commands as in the example below.

$ module avail ssaha2
----------------------------------- /usr/local/Modules/3.2.9/modulefiles -----------------------------------
ssaha2/2.5.3 $ module load ssaha2 $ module list Currently Loaded Modulefiles:
1) ssaha2/2.5.3 $ module unload ssaha2 $ module load ssaha2/2.5.3 $ module list Currently Loaded Modulefiles: 1) ssaha2/2.5.3 $ module show ssaha2 ------------------------------------------------------------------- /usr/local/Modules/3.2.9/modulefiles/ssaha2/2.5.3: module-whatis Sets up SSAHA2 2.5.3 prepend-path PATH /usr/local/apps/ssaha2/2.5.3 prepend-path PATH /usr/local/pileup -------------------------------------------------------------------

Sample Sessions On Biowulf

Submitting a single SSAHA2 batch job

1. Create a directory and put the input files in there. In this example, the directory is /data/$USER/ssaha2/run1

2. Create a script file similar to the one below.

#!/bin/bash
# This file is runSsaha2
#
#PBS -N ssaha2
#PBS -m be
#PBS -k oe

module load ssaha2

cd /data/$USER/ssaha2/run1
ssaha2Build -454 -save htab NCBI36.fa
ssaha2 -454 -output sam -outfile mapped.sam -save htab hs454.fq

3. Submit the script using the 'qsub' command on Biowulf, e.g. Note, user is recommend to run benchmarks to determine what kind of node is suitable for his/her jobs.

qsub -l nodes=1:g8 /data/$USER/runSsaha2

Submitting a swarm of SSAHA2 jobs

Using the 'swarm' utility, one can submit many jobs to the cluster to run concurrently. Set up a swarm command file along the following lines:

cd /data/user/mydir1; ssaha2 -454 -output sam -outfile mapped.sam -save htab file1.fastq
cd /data/user/mydir2; ssaha2 -454 -output sam -outfile mapped.sam -save htab file2.fastq
cd /data/user/mydir3; ssaha2 -454 -output sam -outfile mapped.sam -save htab file3.fastq
cd /data/user/mydir4; ssaha2 -454 -output sam -outfile mapped.sam -save htab file4.fastq
[...]

Submit this job with:

swarm -f cmdfile --module ssaha2

By default, each line of the commands above will be executed on '1' processor core of a node and can use up to 1GB of memory. If an ssaha command requires more than 1 GB of memory, you need to specify this to swarm using the -g # flag, where # is the number of GB of memory required. For example, if each ssaha2 command requires 5 GB of memory, submit the swarm with

swarm -g 5 -f cmdfile

For more information regarding running swarm, see swarm.html

Documentation

ftp://ftp.sanger.ac.uk/pub4/resources/software/ssaha2/ssaha2-manual.pdf