Biowulf at the NIH
RSS Feed
SHRiMP on Biowulf

SHRiMP is a software package for aligning genomic reads against a target genome. It was primarily developed with the multitudinous short reads of next generation sequencing machines in mind, as well as Applied Biosystem's colourspace genomic representation.

Programs Location

/usr/local/shrimp

Sample Sessions On Biowulf

Submitting a single SHRiMP batch job

1. SHRiMP has option to run multi-threaded, which means that a single SHRiMP run can use all available processors on a node. If you are using the multi-threading option (-N), the number of threads should be less than or equal to the number of processors on the node. The Biowulf cluster is heterogenous, and the number of processors (cores) on any node type is reported by the 'freen' command. For example, after type 'freen' on biowulf headnode, the second table 'Free nodes by total node memory', under g8 column, there are two kinds of nodes - o2800:dc and o2600:dc. Both have 8 cores/processors each node. So user should use -N 8 if these kinds of nodes are requested on the 'qsub' command.

2. In the following example, the input files can be copied from /usr/local/src/shrimp/example/.

3. Create a script file. The file will contain the lines similar to the lines below. Modify the path of location before running.

#!/bin/bash
# This file is runShrimp
#
#PBS -N Shrimp
#PBS -m be
#PBS -k oe

cd /data/user/shrimp/run1
/usr/local/shrimp/bin/gmapper-cs test_S1_F3.csfasta ch11_12_validated.fasta -N 8 -o 5 -h 80% >map.out 2>map.log

4. Submit the script using the 'qsub' command on Biowulf. In this example, job was submitted to g8 node which has 4 cores each.

qsub -l nodes=1:g8 /data/username/runShrimp

Submitting a swarm of Shrimp jobs

1. Using the 'swarm' utility, one can submit many jobs to the cluster to run concurrently.

Set up a swarm command file (eg /data/username/cmdfile). Here is a sample file:

cd /data/user/shrimp/run1;/usr/local/shrimp/bin/gmapper-cs test_S1_F3.csfasta ch11_12_validated.fasta -o 5 -h 80% >map.out 2>map.log
cd /data/user/shrimp/run2;/usr/local/shrimp/bin/gmapper-cs test_S1_F3.csfasta ch11_12_validated.fasta -o 5 -h 80% >map.out 2>map.log
cd /data/user/shrimp/run3;/usr/local/shrimp/bin/gmapper-cs test_S1_F3.csfasta ch11_12_validated.fasta -o 5 -h 80% >map.out 2>map.log
cd /data/user/shrimp/run4;/usr/local/shrimp/bin/gmapper-cs test_S1_F3.csfasta ch11_12_validated.fasta -o 5 -h 80% >map.out 2>map.log

 

There are one flag of swarm that's required '-f' and two other flags of swarm user most possibly needs to specify when submit a swarm job: '-t' and '-g'.

-f: the swarm command file name above (required)
-t: number of processors per node to use for each line of the commands in the swarm file above.(optional)
-g: GB of memory needed for each line of the commands in the swarm file above.(optional)

By default, each line of the commands above will be executed on '1' processor core of a node and uses 1GB of memory. If this is not what you want, you will need to specify '-t' and '-g' flags when you submit the job on biowulf.

Say if each line of the commands above also will need to use 10gb of memory instead of the default 1gb of memory, make sure swarm understands this by including '-g 10' flag:

biowulf> $ swarm -g 10 -f cmdfile

For more information regarding running swarm, see swarm.html

Documentation

http://compbio.cs.toronto.edu/shrimp/README