Biowulf at the NIH
RSS Feed
Jannovar on Biowulf

Jannovar, developed by  CBB group at the Institute for Medical Genetics and Human Genetics at Charité-Universitätsmedizin Berlin, is a stand-alone Java application as well as a Java library designed to be used in larger software frameworks for exome analysis. 

The *.jar files are located under /usr/local/apps/jannovar

Transcript data of UCSC and Ensembl used by Jannovar is under the same directory.

Submitting a single Jannovar batch job

1. Create a script file, similar to the one below:

# This file is runJannovar
#PBS -N jannovar
#PBS -m be
#PBS -k oe

cd /data/userID/jannovar/run1
java -jar /usr/local/apps/jannovar/Jannovar.jar -D /usr/local/apps/jannovar/ucsc_hg19.ser -V example.vcf

2. Submit the script using the 'qsub' command on Biowulf,

qsub -l nodes=1:g24:c24 /data/username/runJannovar

In this case, the job is being run on a g24 node (24 GB of memory).

Submitting a swarm of Jannovar jobs

Using the 'swarm' utility, one can submit many jobs to the cluster to run concurrently.

Set up a swarm command file (e.g. /data/username/cmdfile). Here is a sample file. Please note each command is one single line. Do not add any line breaks in one command. Also note that each jobs runs in its own subdirectory. This is required, as the default output directory for all jobs is identical.

cd /data/user/run1; java -jar /usr/local/apps/jannovar/Jannovar.jar -D /usr/local/apps/jannovar/ucsc_hg19.ser -V example.vcf
cd /data/user/run2; java -jar /usr/local/apps/jannovar/Jannovar.jar -D /usr/local/apps/jannovar/ucsc_hg19.ser -V example.vcf
cd /data/user/run3; java -jar /usr/local/apps/jannovar/Jannovar.jar -D /usr/local/apps/jannovar/ucsc_hg19.ser -V example.vcf .... ..... cd /data/user/run10; java -jar /usr/local/apps/jannovar/Jannovar.jar -D /usr/local/apps/jannovar/ucsc_hg19.ser -V example.vcf

Swarm requires one flag: -f, and users will probably want to specify -t, -g, and --module

-f: the swarm command file name above (required)
-t: number of processors per node to use for each line of the commands in the swarm file above.(optional)
-g: GB of memory needed for each line of the commands in the swarm file above.(optional)

You need to tell swarm how many cores to use for each command.. This is done with the -t switch to swarm (8 cores for example here). In addition, each command may require, say, 12 GB of memory. This is specified to swarm using the -g 12 switch. Thus, this swarm command file can be submitted with:

biowulf> $ swarm -t 8 -g 12 -f cmdfile
Users may need to run a few test jobs to determine how much memory is used. Set up a single jannovar job, then submit it to a g24 node. The output from the job will list the memory used by that job.

For more information regarding running swarm, see swarm.html