Biowulf at the NIH
RSS Feed
CoNIFER on Biowulf

CoNIFER uses exome sequencing data to find copy number variants (CNVs) and genotype the copy-number of duplicated genes. As exome capture reactions are subject to strong and systematic capture biases between sample batches, we implemented singular value decomposition (SVD) to eliminate these biases in exome data. CoNIFER offers the ability to mix exome sequence from multiple experimental runs by eliminating batch biases. Together with a short read aligner such as mrsFAST which can align reads to multiple locations, CoNIFER can robustly detect rare CNVs and estimate the copy number of duplicated genes up to ~8 copies with current exome capture kits. 

The environment variable(s) need to be set properly first. The easiest way to do this is by using the modules commands as in the example below.

[user@helix]$ module avail conifer
----------------------------- /usr/local/Modules/3.2.9/modulefiles --------------------------

[user@helix]$ module load conifer

[user@helix]$ module list
Currently Loaded Modulefiles:
  1) conifer/0.2.2

[user@helix]$ module unload conifer

[user@helix]$ module load conifer/0.2.2

[user@helix]$ module list
Currently Loaded Modulefiles:
  1) conifer/0.2.2

[user@helix]$ module show conifer
module-whatis    Sets up conifer 0.2.2  
prepend-path     PATH /usr/local/Python/2.7.5/bin  
prepend-path     PATH /usr/local/apps/conifer/0.2.2/  
prepend-path     LD_LIBRARY_PATH /usr/local/hdf5-1.8.11/lib:/usr/local/Python/2.7.5/lib:/usr/local/gsl-1.15/lib:/usr/local/intel/composer_xe_2013.0.079/mkl/lib/intel64  
setenv           PYTHONPATH /usr/local/Python/2.7.5  
setenv           R_HOME /usr/local/R-2.15-64_cluster/lib64/R   


Sample Sessions On Biowulf

Submitting a single conifer batch job

1. Create a script file. The file will contain the lines similar to the lines below. Modify the path of location before running.

# This file is runConifer
#PBS -N Conifer
#PBS -m be
#PBS -k oe
module load conifer
cd /data/$USER/conifer/run1
python /usr/local/apps/conifer/0.2.2/ analyze \
    --probes sampledata/probes.txt \
    --rpkm_dir sampledata/RPKM_data/ \
    --output analysis.hdf5 \
    --svd 6

2. Submit the script using the 'qsub' command on Biowulf

$ qsub -l nodes=1:g8 /data/$USER/runConifer


Submitting a swarm of conifer jobs

Using the 'swarm' utility, one can submit many jobs to the cluster to run concurrently.

Set up a swarm command file (eg /data/$USER/cmdfile). Here is a sample file:

cd /data/$USER/conifer1; python /usr/local/apps/conifer/0.2.2/ analyze .....
cd /data/$USER/conifer2; python /usr/local/apps/conifer/0.2.2/ analyze .....
........ ........ cd /data/$USER/conifer20; python /usr/local/apps/conifer/0.2.2/ analyze .....

The '-f' and '--module' options for swarm are required, and other flag is possibly needed to submit a swarm job: '-g'.

By default, each line of the command file above will be executed on 1 processor core of a node and use 1gb of memory. If this is not what you want, you will need to specify '-g' flags when you submit the job on biowulf.

Say if each line of the commands above also will need to use 10gb of memory instead of the default 1gb of memory, make sure swarm understands this by including '-g 10' flag:

biowulf> $ swarm -g 10 -f cmdfile --module conifer

For more information regarding running swarm, see swarm.html


Submit an interactive conifer job

1. To do so, user first allocate a node from the cluster then run commands interactively on the node. DO NOT RUN ON BIOWULF LOGIN NODE:

$ qsub -I -l nodes=1:g8

or if your job require bigger memory,

$ qsub -I -l nodes=1:g24:c16

2. Once the job started and a node is allocated, run the interactive commands.

pXXX> $ cd /data/$USER/conifer
pXXX> $ module load conifer
pXXX> $ python /usr/local/apps/conifer/0.2.2/ analyze .....
pXXX> $ exit