Biowulf at the NIH
RSS Feed
HiSeq on Biowulf

HiSeq Analysis Software provides rapid and easy alignment and variant calling for Whole Human Genomes or libraries prepared with the Nextera Rapid Capture (NRC) exome enrichment kit. For Whole Human Genome Sequencing, HiSeq Analysis Software features the Isaac analysis workflow, which is the fastest accurate sequence analysis software, providing a 4-6 times speed increase over existing methods. For NRC analysis, the BWA alignment and GATK variant calling methods are used.  The software can be run through the command line or through a Graphical user interface called Analysis Visual Controller Software (AVC). More details on the supported workflows:

 

Feedback from our user:

A job was submitted to a g24 node and finished successfully.

In this job :
- includes 48 samples from one HiSeq1000 run.
- the primary input files are ~150 – 200 GB
- After the generation of fastq files, the Intensities/L00X and the BaseCalls/L00X files can be deleted to save disk space.
- It is estimated that the run, unattended, would have needed 1 TB of diskspace. The fastq files amount to ~170 GB and the output in the “Alignment” directory (.bam files and .vcf being the most relevant for downstream analyses) amount to another ~170 GB. So say a total of ~350 GB.

 

Running a batch job

NOTE: example files can be downloaded from into user's personal area then run test job.

$ mkdir /data/$USER/hiseq
$ cd /data/$USER/hiseq
$ cp /usr/local/apps/hiseq/NexteraRapidCapture_DemoData.tar.gz . ; tar xvfz NexteraRapidCapture_DemoData.tar.gz
# this file was downloaded originally from ftp://webdata:webdata@ussd-ftp.illumina.com/Data/SequencingRuns/NexteraRapidCapture/NexteraRapidCapture_DemoData.tar.gz
$ cd NexteraRapidCapture_DemoData/130123_SN7001282_0165_Bd1ua6acxx/

Then modify the last line of SampleSheet.csv file so that the pre-downloaded genome in our system can be used.
Or even simplier, simply copy the modified version of SampleSheet.csv from shared area to replace the one in your area:

$ cp -f /usr/local/apps/hiseq/SampleSheet.csv .

1. Create a script file alone the lines below. Note, the following script assume that all the input files are already in place from above.

#!/bin/bash
# This file is hiseqfile
#PBS -N hiseq
#PBS -m be
#PBS -k oe

module load hiseq
cd /data/$USER/hiseq/NexteraRapidCapture_DemoData/130123_SN7001282_0165_Bd1ua6acxx/
RunLatest -r /data/$USER/hiseq/NexteraRapidCapture_DemoData/130123_SN7001282_0165_Bd1ua6acxx/
rm -rf /data/$USER/hiseq/NexteraRapidCapture_DemoData/130123_SN7001282_0165_Bd1ua6acxx/Data/Intensities/L00*
rm -rf /data/$USER/hiseq/NexteraRapidCapture_DemoData/130123_SN7001282_0165_Bd1ua6acxx/Data/Intensities/BaseCalls/L00*

2. submit the file from biowulf:

biowulf> $ qsub -l nodes=1:g24:c24 /data/$USER/TheFileAbove

This job was submitted to a g24 node.

Running an interactive job

Users may need to run jobs interactively sometimes. Such jobs should not be run on the Biowulf login node. Instead allocate an interactive node as described below, and run the interactive job there.


biowulf $ qsub -I -l nodes=1:g24:c24
qsub: waiting for job 2236960.biobos to start
qsub: job 2236960.biobos ready

[user@pXXX]$ module load hiseq
        
[user@pXXX]$ cd /data/$USER/XXXX

[user@pXXX]$ RunLatest -r
[user@pXXX] exit
qsub: job 2236960.biobos completed
user@biowulf]$ 

Documentation

hiseq.pdf