VarScan is a platform-independent, technology-independent software tool for identifying SNPs and indels in massively parallel sequencing of individual and pooled samples. Given data for a single sample, VarScan identifies and filters germline variants based on read counts, base quality, and allele frequency. Given data for a tumor-normal pair, VarScan also determines the somatic status of each variant (Germline, Somatic, or LOH) by comparing read counts between samples.
Jar files for all available versions of VarScan are located in this directory. /usr/local/VarScan/VarScan.jar is a link to the latest version.
1. Create a script file. Here is a sample batch script:
#!/bin/bash # This file is YourOwnFileName # #PBS -N VarScanjob #PBS -m be #PBS -k oe alias VarScan="java -Xmx2000m -jar /usr/local/VarScan/VarScan.jar" cd /data/user/mydir VarScan pileup2snp mypileup.file --min-coverage VarScan pileup2indel mypileup.file --min-coverage
2. Submit the script using the 'qsub' command on Biowulf, e.g. Note, user is recommend to run benchmarks to determine what kind of node is suitable for his/her jobs.To submit to a node with 4 GB of memory (a little more than the 2 GB required by the VarScan job for safety).
Using the 'swarm' utility, one can submit many jobs to the cluster to run concurrently.
Set up a swarm command file (eg /data/username/cmdfile). Here is a sample file:
java -Xmx2000m -jar /usr/local/VarScan/VarScan.jar pileup2snp mypileup1.file --min-coverage java -Xmx2000m -jar /usr/local/VarScan/VarScan.jar pileup2snp mypileup2.file --min-coverage java -Xmx2000m -jar /usr/local/VarScan/VarScan.jar pileup2snp mypileup3.file --min-coverage [...]
Submit this job with
swarm -f cmdfile
By default, each line of the commands above will be executed on '1' processor core of a node and uses 1GB of memory. If each of your VarScan command lines requires more than 1 GB of memory, you should specify the memory required using the '-g #' flag to swarm, where # represents the number of Gigabytes of memory required by a single command. For example, if each of the VarScan commands in the swarm file above require 10 GB of memory, you will need to submit the job with:
[user@biowulf]$ swarm -g 10 -f cmdfile
For more information regarding running swarm, see swarm.html
Users may need to run jobs interactively sometimes. Such jobs should not be run on the Biowulf login node. Instead allocate an interactive node as described below, and run the interactive job there.
qsub: waiting for job 2236960.biobos to start
qsub: job 2236960.biobos ready
[user@p4]$ cd /data/user/myruns
[user@p4]$ cd /data/userID/VarScan/run1
[user@p4]$ java -Xmx2000m -jar /usr/local/VarScan/VarScan.jar pileup2snp mypileup1.file --min-coverage
qsub: job 2236960.biobos completed
Users may add a node property in the qsub command to request specific interactive node. For example, if you need a node with 24gb of memory to run job interactively, do this: