High-Performance Computing at the NIH

RSS Feed
Delly on Helix & Biowulf

DELLY is an integrated structural variant prediction method that can detect deletions, tandem duplications, inversions and translocations at single-nucleotide resolution in short-read massively parallel sequencing data. It uses paired-ends and split-reads to sensitively and accurately delineate genomic rearrangements throughout the genome.

 

The environment variable(s) need to be set properly first. The easiest way to do this is by using the modules commands as in the example below.

[user@helix]$ module avail delly
----------------------------- /usr/local/Modules/3.2.9/modulefiles --------------------------
delly/0.0.11(default)

[user@helix]$ module load delly

[user@helix]$ module list
Currently Loaded Modulefiles:
  1) delly/0.0.11

[user@helix]$ module unload delly

[user@helix]$ module load delly/0.0.11

[user@helix]$ module list
Currently Loaded Modulefiles:
  1) delly/0.0.11

[user@helix]$ module show delly
-------------------------------------------------------------------
/usr/local/Modules/3.2.9/modulefiles/delly/0.0.11:

module-whatis    Sets up delly 0.0.11 
prepend-path     PATH /usr/local/apps/delly/0.0.11  
-------------------------------------------------------------------

  
Running Delly on Helix

Sample session:

$ module load delly
$ cd /data/$USER
$ delly lib1.bam lib2.bam libN.bam

 

Delly can multi-thread. On Helix, you should use a maximum of 4 threads, by setting
export OMP_NUM_THREADS=4

Running a Delly job on Biowulf

Set up a batch script along the following lines:

#!/bin/bash

module load delly
cd /data/user/somedir
echo "Running $OMP_NUM_THREADS threads"

delly -t DEL -o del.vcf -g ref.fa sample1.sort.bam ... sampleN.sort.bam
Submit with:
qsub -v OMP_NUM_THREADS=16 -l nodes=1:c16 myjobscript
Note that Delly is threaded via OpenMP, which means it can run multiple threads on multiple cores of a single node. It cannot run on more than 1 node. You should also ensure that the number of threads (in this case, OMP_NUM_THREADS=16) matches the number of cores requested (in this case, c16). If you run more threads than there are cores, you will overload the node, which leads to poor performance.

Running a swarm of Delly jobs on Biowulf

Set up a swarm command file along the following lines:

cd /data/user/somedir1; delly -t DEL -o del1.vcf -g ref.fa sample1.sort.bam ... sampleN.sort.bam
cd /data/user/somedir2; delly -t DEL -o del2.vcf -g ref.fa sample1.sort.bam ... sampleN.sort.bam
cd /data/user/somedir3; delly -t DEL -o del3.vcf -g ref.fa sample1.sort.bam ... sampleN.sort.bam
[.etc..]
Submit with:
biowulf% swarm -f swarmfile --module delly/0.5.5 
Note that each delly run in the swarm will be single-threaded. It is complicated to run multi-threaded Delly jobs via swarm, as you do not know in advance how many cores will be on the allocated nodes.

Documentation

http://www.embl.de/~rausch/delly.html