Biowulf at the NIH
RSS Feed
VarScan on Biowulf

VarScan is a platform-independent, technology-independent software tool for identifying SNPs and indels in massively parallel sequencing of individual and pooled samples. Given data for a single sample, VarScan identifies and filters germline variants based on read counts, base quality, and allele frequency. Given data for a tumor-normal pair, VarScan also determines the somatic status of each variant (Germline, Somatic, or LOH) by comparing read counts between samples.

 

Programs Location

/usr/local/apps/varscan

Jar files for all available versions of VarScan are located in this directory.

Submitting a single batch job

1. Create a script file. Here is a sample batch script:

#!/bin/bash
# This file is YourOwnFileName
#
#PBS -N VarScanjob
#PBS -m be
#PBS -k oe

cd /data/user/mydir
java -Xmx3000m -jar /usr/local/apps/varscan/current/VarScan.v2.3.6.jar pileup2snp mypileup.file --min-coverage
java -Xmx3000m -jar /usr/local/apps/varscan/current/VarScan.v2.3.6.jar pileup2indel mypileup.file --min-coverage

Note: Replace '2.3.6' with your choice of version number. The VarScan is set here to use 3000 MB = 3 GB of memory. This value can be adjusted to suit your own job.

2. Submit the script using the 'qsub' command on Biowulf, e.g. Note, user is recommend to run benchmarks to determine what kind of node is suitable for his/her jobs.

To submit to a node with more than 3GB of memory (all our nodes are > 4 gb now).
[user@biowulf]$ qsub -l nodes=1 /data/username/theScriptFileAbove
Use 'freen' to see available node types.

 

Submitting a swarm of jobs

Using the 'swarm' utility, one can submit many jobs to the cluster to run concurrently.

Set up a swarm command file (eg /data/username/cmdfile). Here is a sample file:

java -Xmx2000m -jar /usr/local/apps/varscan/current/VarScan.*.*.*.jar pileup2snp mypileup1.file --min-coverage
java -Xmx2000m -jar /usr/local/apps/varscan/current/VarScan.*.*.*.jar pileup2snp mypileup2.file --min-coverage
java -Xmx2000m -jar /usr/local/apps/varscan/current/VarScan.*.*.*.jar pileup2snp mypileup3.file --min-coverage
[...]     

Submit this job with

$ swarm -f cmdfile

By default, each line of the commands above will be executed on '1' processor core of a node and uses 1GB of memory. If each of your VarScan command lines requires more than 1 GB of memory, you should specify the memory required using the '-g #' flag to swarm, where # represents the number of Gigabytes of memory required by a single command. For example, if each of the VarScan commands in the swarm file above require 2 GB of memory, you will need to submit the job with:

[user@biowulf]$ swarm -g 2 -f cmdfile

For more information regarding running swarm, see swarm.html

 

Running an interactive job

Users may need to run jobs interactively sometimes. Such jobs should not be run on the Biowulf login node. Instead allocate an interactive node as described below, and run the interactive job there.

[user@biowulf] $ qsub -I -l nodes=1
qsub: waiting for job 2236960.biobos to start
qsub: job 2236960.biobos ready

[user@p4]$ cd /data/user/myruns
[user@p4]$ cd /data/userID/VarScan/run1
[user@p4]$ java -Xmx2000m -jar /usr/local/apps/varscan/current/VarScan.*.*.*.jar pileup2snp mypileup1.file --min-coverage
[user@p4] exit
qsub: job 2236960.biobos completed
[user@biowulf]$

Users may add a node property in the qsub command to request specific interactive node. For example, if you need a node with 24gb of memory to run job interactively, do this:

[user@biowulf]$ qsub -I -l nodes=1:g24:c16

 

Documentation

http://varscan.sourceforge.net/using-varscan.html