Biowulf at the NIH
RSS Feed
Meerkat on Biowulf

This program identifies structural variations from whole-genome sequencing data using patterns of discordant read clusters. It's developed by Peter Park at Harvard.

 

Submitting a single batch job

1. Create a script file, similar to the one below:

#!/bin/bash
# This file is runMeerkat
#
#PBS -N meerkat
#PBS -m be
#PBS -k oe


cd /data/userID/meerkat/run1
perl /usr/local/apps/meerkat/current/scripts/pre_process.pl \
	-b /usr/local/apps/meerkat/Meerkat.example/bam/example.sorted.bam \	
	-I /fdb/igenomes/Homo_sapiens/UCSC/hg18/Sequence/BWAIndex/genome.fa \
	-A /fdb/igenomes/Homo_sapiens/UCSC/hg18/Sequence/WholeGenomeFasta/genome.fa.fai \
	-W /usr/local/apps/bwa/current/ \
	-S /usr/local/apps/samtools/

2. Submit the script using the 'qsub' command on Biowulf,

qsub -l nodes=1:g24:c24 /data/username/runMeerkat

In this case, the job is being run on a g24 node (24 GB of memory).

Submitting a swarm of jobs

Using the 'swarm' utility, one can submit many jobs to the cluster to run concurrently.

Set up a swarm command file (e.g. /data/username/cmdfile). Here is a sample file. Please note each command is one single line. Do not add any line breaks in one command. Also note that each jobs runs in its own subdirectory.

cd /data/user/run1; meerkat command1; meerkat command2
cd /data/user/run2; meerkat command1; meerkat command2
cd /data/user/run3; meerkat command1; meerkat command2 .... ..... cd /data/user/run10; meerkat command1; meerkat command2

Swarm requires one flag: -f, and users will probably want to specify -t, -g, and --module

-f: the swarm command file name above (required)
-t: number of processors per node to use for each line of the commands in the swarm file above.(optional)
-g: GB of memory needed for each line of the commands in the swarm file above.(optional)

You need to tell swarm how many cores to use for each command. It's 1 core each command by default. This is done with the -t switch to swarm (8 cores for example here). In addition, each command may require, say, 12 GB of memory. This is specified to swarm using the -g 12 switch. Thus, this swarm command file can be submitted with:

biowulf> $ swarm -t 8 -g 12 -f cmdfile
Users may need to run a few test jobs to determine how much memory is used. Set up a single job, then submit it to a g24 node. The output from the job will list the memory used by that job.

For more information regarding running swarm, see swarm.html

Documentation

http://compbio.med.harvard.edu/Meerkat/