Biowulf at the NIH
RSS Feed
RUM on Biowulf

RUM is an alignment, junction calling, and feature quantification pipeline specifically designed for Illumina RNA-Seq data. RUM can also be used effectively for DNA sequencing (e.g. ChIP-Seq) and microarray probe mapping. RUM also has a strand specific mode. RUM is highly configurable, however it does not require fussing over options, the defaults generally give good results. Rum is developed by Gregory Grant at University of Pennsyvania.

The environment variable(s) need to be set properly first. The easiest way to do this is by using the modules commands as in the example below.

$ module avail rum
---------------------- /usr/local/Modules/3.2.9/modulefiles --------------------------------
rum/1.09              rum/1.11              rum/2.0.1_01(default)


$ module load rum

$ module list
Currently Loaded Modulefiles:
1) rum/2.0.1_01 $ module unload rum $ module load rum/1.11 $ module show rum ------------------------------------------------------------------- /usr/local/Modules/3.2.9/modulefiles/rum/2.0.1_01: module-whatis Sets up rum 2.0.1_01 prepend-path PATH /usr/local/apps/rum/2.0.1_01/bin -------------------------------------------------------------------

 

Submitting a single batch job

1. Create a script file alone the following lines. Run 'module load rum; rum_runner help' for help. Prebuilt indexes and config files are under /usr/local/apps/rum/indexes/ORGANISM

#!/bin/bash
# This file is rumScript
#
#PBS -N rum
#PBS -m be
#PBS -k oe

module load rum
rum_runner align -i /usr/local/apps/rum/indexes/hg19 -o /data/$USER/rum/out --chunks 3 --name rumName \
--platform Local /data/$USER/1.fq /data/$USER/2.fq
 

Note: replace '3' with correct chunk number based on the node your job is submitted to. For a human genome, each chunk will use about 6 GB of ram. For example, g24 has 24gb and about 22gb of available memory. So 3 chunks will use about 18gb of memory and should be used. g8 node has only 8gb of memory so only 1 chunk should be specified.

2. Submit the script using the 'qsub' command on Biowulf.

$ qsub -l nodes=1:g24:c24 ./script

Running an interactive job

User may need to run jobs interactively sometimes. Such jobs should not be run on the Biowulf login node.

Allocate an interactive node as described below, and run the interactive job there.

biowulf% qsub -I -l nodes=1:g8
qsub: waiting for job 2236960.biobos to start
qsub: job 2236960.biobos ready

$ cd /data/$USER/run
$ module load rum
$ rum_runner align -i /usr/local/apps/rum/indexes/mm9 -o /data/$USER/rum/out --chunks 1 --name rumName \ --platform Local /data/$USER/1.fq /data/$USER/2.fq
$ exit
qsub: job 2236960.biobos completed
$

 

Submitting a swarm of rum jobs

Using the 'swarm' utility, one can submit many jobs to the cluster to run concurrently.

Set up a swarm command file (eg /data/$USER/cmdfile). Here is a sample file:

module load rum; cd /data/$USER/run1; rum_runner align -i /usr/local/apps/rum/indexes/hg19 -o /data/$USER/rum/out --chunks 3 --name rumName --platform Local 1.fq 2.fq
module load rum; cd /data/$USER/run2; rum_runner align -i /usr/local/apps/rum/indexes/hg19 -o /data/$USER/rum/out --chunks 3 --name rumName --platform Local 1.fq 2.fq
module load rum; cd /data/$USER/run3; rum_runner align -i /usr/local/apps/rum/indexes/hg19 -o /data/$USER/rum/out --chunks 3 --name rumName --platform Local 1.fq 2.fq
[......]

By default, each line of the command file above will be executed on 1 processor core of a node and use 1gb of memory. Since we want to use whole node (> 18gb) for each line of command, -g 24 will be speicified when submit the swarm job.

biowulf> $ swarm -g 24 -f TheFileAbove

For more information regarding running swarm, see swarm.html

Documentation

https://github.com/PGFI/rum/wiki