HiSeq Analysis Software provides rapid and easy alignment and variant calling for Whole Human Genomes or libraries prepared with the Nextera Rapid Capture (NRC) exome enrichment kit. For Whole Human Genome Sequencing, HiSeq Analysis Software features the Isaac analysis workflow, which is the fastest accurate sequence analysis software, providing a 4-6 times speed increase over existing methods. For NRC analysis, the BWA alignment and GATK variant calling methods are used. The software can be run through the command line or through a Graphical user interface called Analysis Visual Controller Software (AVC). More details on the supported workflows:
- Enrichment analysis workflow: analyzes DNA that has been enriched for particular target sequences using Nextera Rapid Capture. Alignment is performed with BWA and variant calling with GATK. Variant analysis is only performed for the target regions. Statistics reporting accumulates coverage and enrichment specific statistics for each target as well as overall metrics.
- Whole Genome Sequencing analysis workflow: uses the Isaac Aligner and Isaac Variant Caller to compare the DNA sequence in the sample(s) against the human reference genome hg19. It identifies any variants (SNPs or indels) relative to the reference sequence.
Feedback from our user:
A job was submitted to a g24 node and finished successfully.
In this job :
- includes 48 samples from one HiSeq1000 run.
- the primary input files are ~150 – 200 GB
- After the generation of fastq files, the Intensities/L00X and the BaseCalls/L00X files can be deleted to save disk space.
- It is estimated that the run, unattended, would have needed 1 TB of diskspace. The fastq files amount to ~170 GB and the output in the “Alignment” directory (.bam files and .vcf being the most relevant for downstream analyses) amount to another ~170 GB. So say a total of ~350 GB.
NOTE: example files can be downloaded from into user's personal area then run test job.
$ mkdir /data/$USER/hiseq
$ cd /data/$USER/hiseq
$ cp /usr/local/apps/hiseq/NexteraRapidCapture_DemoData.tar.gz . ; tar xvfz NexteraRapidCapture_DemoData.tar.gz
# this file was downloaded originally from ftp://webdata:email@example.com/Data/SequencingRuns/NexteraRapidCapture/NexteraRapidCapture_DemoData.tar.gz
$ cd NexteraRapidCapture_DemoData/130123_SN7001282_0165_Bd1ua6acxx/
Then modify the last line of SampleSheet.csv file so that the pre-downloaded genome in our system can be used.
Or even simplier, simply copy the modified version of SampleSheet.csv from shared area to replace the one in your area:
$ cp -f /usr/local/apps/hiseq/SampleSheet.csv .
1. Create a script file alone the lines below. Note, the following script assume that all the input files are already in place from above.
#!/bin/bash # This file is hiseqfile #PBS -N hiseq #PBS -m be #PBS -k oe module load hiseq cd /data/$USER/hiseq/NexteraRapidCapture_DemoData/130123_SN7001282_0165_Bd1ua6acxx/ RunLatest -r /data/$USER/hiseq/NexteraRapidCapture_DemoData/130123_SN7001282_0165_Bd1ua6acxx/
rm -rf /data/$USER/hiseq/NexteraRapidCapture_DemoData/130123_SN7001282_0165_Bd1ua6acxx/Data/Intensities/L00*
rm -rf /data/$USER/hiseq/NexteraRapidCapture_DemoData/130123_SN7001282_0165_Bd1ua6acxx/Data/Intensities/BaseCalls/L00*
2. submit the file from biowulf:
This job was submitted to a g24 node.
Users may need to run jobs interactively sometimes. Such jobs should not be run on the Biowulf login node. Instead allocate an interactive node as described below, and run the interactive job there.
biowulf $ qsub -I -l nodes=1:g24:c24 qsub: waiting for job 2236960.biobos to start qsub: job 2236960.biobos ready [user@pXXX]$ module load hiseq [user@pXXX]$ cd /data/$USER/XXXX [user@pXXX]$ RunLatest -r [user@pXXX] exit qsub: job 2236960.biobos completed user@biowulf]$