Biowulf at the NIH
RSS Feed
RSeQC on Biowulf

RSeQC package provides a number of useful modules that can comprehensively evaluate high throughput sequence data especially RNA-seq data. “Basic modules” quickly inspect sequence quality, nucleotide composition bias, PCR bias and GC bias, while “RNA-seq specific modules” investigate sequencing saturation status of both splicing junction detection and expression estimation, mapped reads clipping profile, mapped reads distribution, coverage uniformity over gene body, reproducibility, strand specificity and splice junction annotation

The paths for the appropriate version of Python and the RSeQC python scripts is most easily set up by using the modules commands as in the example below:

biowulf% module avail rseqc

-------------- /usr/local/Modules/3.2.9/modulefiles -----------------------
rseqc/2.3
biowulf% module load rseqc

biowulf% module list
Currently Loaded Modulefiles:
  1) rseqc/2.3

Submitting a Single Batch Job

1. Create a script file along the lines of the one below:

#!/bin/bash
# This file is FileName
#
#PBS -N RunName
#PBS -m be
#PBS -k oe

# load the latest version of RSeQC
module load rseqc

cd /data/user/somewhereWithInputfile

# report reads mapping statistics from BAM file
bam_stat.py -i myfile.bam &> output

# check nucleotide composition bias
read_NVC.py -i myfile.bam -o nuc_comp -x 

# convert file from BAM format to wiggle format
bam2wig.py -i myfile.bam -s chromsize.file -o mywig 

2. Submit the script using the 'qsub' command on Biowulf.

qsub -l nodes=1 /data/username/theScriptFileAbove

This script will be run on a node with the default of at least 1 GB memory and 2 cores. If your RSeQC commands require more than 1 GB of memory, you should specify the node type on the qsub command line. For example, if your commands require 7 GB of memory, you should submit to an 8 GB node with:
qsub -l nodes=1:g8 /data/username/theScriptFileAbove

Submitting a Swarm of Jobs

Using the 'swarm' utility, one can submit many jobs to the cluster to run concurrently.

Set up a swarm command file (eg /data/username/cmdfile). Here is a sample file:

module load rseqc; bam_stat.py -i file1.bam &> bam_stats1.out
module load rseqc; bam_stat.py -i file2.bam &> bam_stats2.out
module load rseqc; bam_stat.py -i file3.bam &> bam_stats3.out
[....]

Submit this swarm with

swarm -f cmdfile

By default, each line of the commands above will be executed on '1' processor core of a node and uses 1GB of memory. If each command requires more than 1 GB of memory, you should specify the memory required when you submit the swarm. For example, if each command requires 4 GB of memory, you would submit with:

swarm -g 4 -f cmdfile

For more information regarding running swarm, see swarm.html

 

Running an Interactive Job

Users may need to run jobs interactively sometimes. Such jobs should not be run on the Biowulf login node. Instead allocate an interactive node as described below, and run the interactive job there.

biowulf% qsub -I -l nodes=1
qsub: waiting for job 2236960.biobos to start
qsub: job 2236960.biobos ready

[user@p4]$ cd /data/user/myruns
[user@p4]$ module load rseqc
[user@p4]$ cd /data/user/somewhereWithInputfile
[user@p4]$ overlay_bigwig.py -i bigwigfile1 -j bigwigfile2 -a Average -o out.wig
[user@p4] exit
qsub: job 2236960.biobos completed
[user@biowulf ~]$

Users may add a node property in the qsub command to request a specific kind of interactive node. For example, if you need a node with 8gb of memory to run job interactively, do this:

biowulf% qsub -I -l nodes=1:g8

Documentation

http://dldcc-web.brc.bcm.edu/lilab/liguow/CGI/rseqc/_build/html/index.html