Biowulf at the NIH
RSS Feed
PennCNV on Biowulf

PennCNV is a software tool for Copy Number Variation (CNV) detection from SNP genotyping arrays. Currently it can handle signal intensity data from Illumina and Affymetrix arrays. With appropriate preparation of file format, it can also handle other types of SNP arrays and oligonucleotide arrays.

PennCNV implements a hidden Markov model (HMM) that integrates multiple sources of information to infer CNV calls for individual genotyped samples. It differs form segmentation-based algorithm in that it considered SNP allelic ratio distribution as well as other factors, in addition to signal intensity alone. In addition, PennCNV can optionally utilize family information to generate family-based CNV calls by several different algorithms. Furthermore, PennCNV can generate CNV calls given a specific set of candidate CNV regions, through a validation-calling algorithm.

 

The environment variable(s) need to be set properly first. The easiest way to do this is by using the modules commands as in the example below.

$ module avail penncnv
-------------------- /usr/local/Modules/3.2.9/modulefiles ----------------------
penncnv/current(default)

$ module load penncnv

$ module list
Currently Loaded Modulefiles:
1) penncnv/current $ module unload penncnv $ module load penncnv/current $ module show penncnv ------------------------------------------------------------------- /usr/local/Modules/3.2.9/modulefiles/penncnv/current: module-whatis Sets up penncnv prepend-path PATH /usr/local/apps/penncnv/current -----------------------------------------------------------------

 

Sample Sessions On Biowulf

PennCNV sample files can be copied from: /usr/local/apps/penncnv/current/example

Submitting a single PennCNV batch job

1. Create a script file. Modify the path of location before running.

#!/bin/bash
# This file is penncnv
#
#PBS -N penncnv
#PBS -m be
#PBS -k oe
module load penncnv
cd /home/$USER/penncnv/run1
./runex.pl --path_detect_cnv detect_cnv.pl 1
./runex.pl --path_detect_cnv detect_cnv.pl 2
./runex.pl --path_detect_cnv detect_cnv.pl 3
./runex.pl --path_detect_cnv detect_cnv.pl 4
./runex.pl --path_detect_cnv detect_cnv.pl 5
./runex.pl --path_detect_cnv detect_cnv.pl 6
./runex.pl --path_visualize_cnv visualize_cnv.pl 7
./runex.pl --path_convert_cnv convert_cnv.pl 8
./runex.pl --path_convert_cnv convert_cnv.pl 9
./runex.pl --path_filter_cnv filter_cnv.pl 10
./runex.pl --path_compare_cnv compare_cnv.pl 11
./runex.pl --path_compare_cnv compare_cnv.pl 12
./runex.pl --path_infer_allele infer_snp_allele.pl 13
./runex.pl --path_infer_allele infer_snp_allele.pl 14

2. Submit the script using the 'qsub' command, e.g.

$ qsub -l nodes=1 /home/$USER/ScriptAbove

Submitting a swarm of PennCNV jobs

Using the 'swarm' utility, one can submit many jobs to the cluster to run concurrently.

Set up a swarm command file (eg /home/username/cmdfile) containing several line of commands. Each line of the commands will be executed on one processor as a single job. User will create separate files for each command (job). Below is the sample file(for example, the file name is run1file) for each job:

module load penncnv
cd /home/user/penncnv/run1
./runex.pl --path_detect_cnv detect_cnv.pl 1
./runex.pl --path_detect_cnv detect_cnv.pl 2
./runex.pl --path_detect_cnv detect_cnv.pl 3
./runex.pl --path_detect_cnv detect_cnv.pl 4
./runex.pl --path_detect_cnv detect_cnv.pl 5
./runex.pl --path_detect_cnv detect_cnv.pl 6
./runex.pl --path_visualize_cnv visualize_cnv.pl 7
./runex.pl --path_convert_cnv convert_cnv.pl 8
./runex.pl --path_convert_cnv convert_cnv.pl 9
./runex.pl --path_filter_cnv filter_cnv.pl 10
./runex.pl --path_compare_cnv compare_cnv.pl 11
./runex.pl --path_compare_cnv compare_cnv.pl 12
./runex.pl --path_infer_allele infer_snp_allele.pl 13
./runex.pl --path_infer_allele infer_snp_allele.pl 14

Below is the sample swam command file containing several penncnv jobs:

run1file
run2file
run3file
run4file
....
....

 

By default, each line of the commands above will be executed on '1' processor core of a node and uses 1GB of memory. If this is not what you want, you will need to specify '-g' flags when you submit the job on biowulf.

Say if each line of the commands above also will need to use 10gb of memory instead of the default 1gb of memory, make sure swarm understands this by including '-g 10' flag:

$ swarm -g 10 -f cmdfile

For more information regarding running swarm, see swarm.html

 

 

Running an interactive job

User may need to run jobs interactively sometimes. Such jobs should not be run on the Biowulf login node. Instead allocate an interactive node as described below, and run the interactive job there.

$ qsub -I -l nodes=1
qsub: waiting for job 2236960.biobos to start
      qsub: job 2236960.biobos ready 
$ cd /data/$USER/example
$ module load penncnv
$ ./runex.pl --path_detect_cnv detect_cnv.pl 6
$ exit
qsub: job 2236960.biobos completed

User may add property of node in the qsub command to request specific interactive node. For example, if you need a node with 8gb of memory to run job interactively, do this:

$ qsub -I -l nodes=1:g8

 

Documentation

http://www.openbioinformatics.org/penncnv/penncnv_examples.html