CASAVA is the part of Illumina's sequencing analysis software that performs alignment of a sequencing run to a reference genome and subsequent variant analysis and read counting.
Casava executables are in /usr/local/casava/binThe iGenomes data is maintained on Biowulf in /fdb/igenomes.
The environment variable(s) need to be set properly first. The easiest way to do this is by using the modules commands, as in the example below.
biowulf% module avail casava ---------------- /usr/local/Modules/3.2.9/modulefiles ---------------------- casava/1.8.0 casava/1.8.2(default) biowulf% module load casava/1.8.2 biowulf% module list Currently Loaded Modulefiles: 1) casava/1.8.2
The following examples make use of the built in example files that are provided with the CASAVA package. The procedure is explained in detail in the Casava User Guide. You can copy the sample files and run this example in your own /data area, as shown below. This should be done on an interactive node.
The following section gives a brief example of running casava. Then the sections 'How to submit a batch job', 'How to submit a swarm job', and 'How to run interactive job' use these example commands.First allocate an interactive node with:
qsub -I -l nodes=1:c16:g24
This will allocate a node with 16 cores (c16) and 24 GB of memory (g24) which will be plenty for this example run.
Bcl Conversion and Demultiplexing
-- Convert *.bcl files into compressed FASTQ files
-- Separate multiplexed sequence runs by index
-- Demultiplexing needs a BaseCalls directory and a sample sheet to start a run.
Create a working directory for this project
$ cd casava_example_dir
# Copy and modify the SampleSheet.csv located at /usr/local/CASAVA_v1.8.0/share/CASAVA-1.8.0/examples/biowulf/SampleSheet.csv to your working directory.
The standard way to run bcl conversion and demultiplexing is to first create the necessary Makefile, which configure the run. Then run 'make' on the generated files, which executes the calculations.
1. Enter the following command to created a makefile for demultiplexing:
$ configureBclToFastq.pl \
--output-dir Unaligned \
--force --ignore-missing-bcl --ignore-missing-stats \
2. Change directory into the newly created Unaligned folder specified by --output-dir above
3. Run the 'make' command, specifying 16 threads (-j 16) since you have allocated a 16-core node. If you allocated a different type of node, you should modify this number to match the number of cores on the node.
-- The above process generated .fastq.gz files under Sample_AR008 and Sample_PhiX respectively (de-multiplexed)
1. Copy the configureAlignment configuration file, config.txt, and edit it.
cp /usr/local/CASAVA_v1.8.0/share/CASAVA-1.8.0/examples/biowulf/config.txt .
Edit the first two parameters, EXPT_DIR and OUT_DIR, to match the path of your working directory. If you have been following the example above exactly, they would be set to
EXPT_DIR /data/$USER/casava_example_dir/Unaligned OUT_DIR /data/$USER/casava_example_dir/Aligned
2. Enter the configureAlignment.pl command with --make
3. Change directory into the newly created Aligned folder. Type 'make' command for basic analysis
$ make -j 16
The above process generated _export.txt.gz files in the Project_Demo tree under each Sample folder, e.g.
-- The input files for CASAVA variant detection can be found in the Aligned directory generated in configureAlignment step
-- CASAVA build process is divided into several modules (or targets), each of which complete a major portion of the post-alignment analysis pipeline:
"sort" - bins aligned reads into separate regions of the reference genome, sorts these reads and optionally removes PCR duplicates (for paired-end reads) and finally converts these reads to BAM format.
"assembleIndels" - Is used to search for clusters of poorly aligned and anomalous reads. These clusters of reads are de-novo assembled into contains, which are aligned back to the reference to produce candidate indwells.
"callSmallVariants" - This module uses the sorted BAM files and the candidate indels predicted by the assembleIndels module to perform local read realignment and genotype SNPs and indels under a diploid gene and exon counts.Run the following command:
$ configureBuild.pl \
-id /data/$USER/casava_example_dir/Aligned/Project_Demo/Sample_AR008 \
-od /data/$USER/casava_example_dir/Aligned/Project_Demo/Sample_AR008/Build2 \
--workflowAuto -j 16 --targets all
This is the end of the test run, so you should exit from the interactive node:
p2338% exit qsub: job 2533238.biobos completed [user@biowulf]$
Submitting a Batch Job
The time consuming or memory intensive steps of casava can be put into a script file and submit to the batch system. For example, the 'make -j 16' is the time consuming step and is suitable for batch or swarm job.
1. Create a script file like the one below:
#!/bin/bash # This file is casavafile # #PBS -N casava #PBS -m be #PBS -k oe cd /data/$USER/casava/run1/ make -j 16
2. Submit the script using the 'qsub' command on Biowulf.
Note: In this example, the job was submitted to a node with 24 GB of memory (g24) and 16 cores (c16). Therefore '-j 16' was issued after the 'make' command in the batch script which means 16 processors will be used to run the 'make' command. if another kind of node was requested which has different number of cores, please change the number in '-j 16' accordingly.
Submitting a swarm of jobs
Sometimes users have several sets of data under different directories. The same analytical steps are being performed on each dataset. The 'swarm' utility can be used to submit many similar jobs like this.
Before running swarm, the appropriate configuration files should be created in each directory.
To submit a swarm job, create a swarm command file like this, called, say, 'cmdfile'.
cd /data/userid/casava1; make -j 16 cd /data/userid/casava2; make -j 16 cd /data/userid/casava3; make -j 16 .... cd /data/userid/casava20; make -j 16
Note that each command (a single line in the file above) will require 16 cores. This value must also be given to the swarm command. Submit this job with:
swarm -t 16 -g 24 -f cmdfile
For more information regarding running swarm, see swarm.html