Biowulf at the NIH
RSS Feed
MAP-RSeq on Helix & Biowulf
MAP-RSeq, Mayo Analysis Pipeline for RNA Sequencing offers an end-to-end solution to analyze and interpret next generation RNA sequencing data. MAP-RSeq:

Setting up your Maprseq job

The following example uses the sample run provided with the MapRSeq package. First, you copy the sample data and configuration files to your own area, then edit the configuration files appropriately.

biowulf% mkdir /data/$USER/maprseq; cd /data/$USER/maprseq

biowulf% module load maprseq

biowulf% cp -r $MAPRSEQ_HOME/sample_data .

biowulf% cp $MAPRSEQ_HOME/config/* .

Edit the file run_info.txt to set the following variables appropriately:

PI=my_username
MEMORY_INFO=/data/$USER/maprseq/memory_info.txt
SAMPLE_INFO=/data/$USER/maprseq/sample_info.txt
TOOL_INFO=/data/$USER/maprseq/tool_info.txt
INPUT_DIR=/data/$USER/maprseq/sample_data
BASE_OUTPUT_DIR=/data/$USER/maprseq/sample_output

For your own data, you may need to change the values for other variables. See the MapRSeq user guide for details.

You can also change the values in memory_info.txt to set the number of threads.

#### Thread requirements
THREADS=20
ALIGN_THREADS=20
SORT_THREADS=20

In the above example, 20 threads have been requested, so it is important to run on a Biowulf node with at least 20 cores.

Running MAP-RSeq on Helix

It is best to run MAP-RSeq on Biowulf, due to the memory requirements and specific versions of libraries.

Running MAP-RSeq on Biowulf

Set up a batch script along the following lines:

#!/bin/bash
#PBS -N MapRSeq
#PBS -m be

cd /data/$USER/maprseq
module load maprseq
mrna.pl -r=/path/to/run_info.txt > job.log 2>&1

Submit this job with a command like:

biowulf% qsub -l nodes=1:c24:g24  myscript.bat
The log file job.log will report the status of the job. You can examine it with a command like 'tail -f job.log'. In addition, there will be specific logs written by the MAP-Rseq program which you can examine for errors.

Running MAP-RSeq interactively

Allocate an interactive node on Biowulf, and run the process there as in the example below.

biowulf% qsub -I -l nodes=1:c24:g24
qsub: waiting for job 6806847.biobos to start
qsub: job 6806847.biobos ready

[susanc@p2347 ~]$ cd /data/susanc/maprseq
[susanc@p2347 maprseq]$ module load maprseq
[susanc@p2347 maprseq]$ 
[susanc@p2347 maprseq]$ mrna.pl -r=./run_info.txt
/spin1/sys/i386/usrlocal/apps/maprseq/1.2.1
Command "mkdir -p /data/susanc/maprseq/sample_output/susanc"
Command "mkdir -p /data/susanc/maprseq/sample_output/susanc/mrnaseq"
Command "mkdir -p /data/susanc/maprseq/sample_output/susanc/mrnaseq/test"
Command "mkdir -p /data/susanc/maprseq/sample_output/susanc/mrnaseq/test/job_ids"
Command "mkdir -p /data/susanc/maprseq/sample_output/susanc/mrnaseq/test/error"
Command "mkdir -p /data/susanc/maprseq/sample_output/susanc/mrnaseq/test/RSeQC"
Command "mkdir -p /data/susanc/maprseq/sample_output/susanc/mrnaseq/test/variant/logs"
Command "mkdir -p /data/susanc/maprseq/sample_output/susanc/mrnaseq/test/variant/temp"
Command "mkdir -p /data/susanc/maprseq/sample_output/susanc/mrnaseq/test/variant/plot"
Command "mkdir -p /data/susanc/maprseq/sample_output/susanc/mrnaseq/test/fastqc"
Command "mkdir -p /data/susanc/maprseq/sample_output/susanc/mrnaseq/test/fastq"
Command "mkdir -p /data/susanc/maprseq/sample_output/susanc/mrnaseq/test/fusion"
Command "mkdir -p /data/susanc/maprseq/sample_output/susanc/mrnaseq/test/logs"
Command "cp ./run_info.txt /data/susanc/maprseq/sample_output/susanc/mrnaseq/test/run_info.txt"
Command "cp /data/susanc/maprseq/tool_info.txt /data/susanc/maprseq/sample_output/susanc/mrnaseq/test/tool_info.txt"
Command "cp /data/susanc/maprseq/sample_info.txt /data/susanc/maprseq/sample_output/susanc/mrnaseq/test/sample_info.txt"
Command "cp /data/susanc/maprseq/memory_info.txt /data/susanc/maprseq/sample_output/susanc/mrnaseq/test/memory_info.txt"
Command "cp /usr/local/apps/maprseq/1.2.1/src/mrnaseq_workflow.png /data/susanc/maprseq/sample_output/susanc/mrnaseq/test/."
Command "cp -r /usr/local/apps/maprseq/1.2.1/src/fancybox /data/susanc/maprseq/sample_output/susanc/mrnaseq/test/."
Command "cp /usr/local/apps/maprseq/1.2.1/src/IGV_Setup.doc /data/susanc/maprseq/sample_output/susanc/mrnaseq/test/."
Command " /usr/local/apps/maprseq/1.2.1/src/modify_gtf.pl -o=/data/susanc/maprseq/sample_output/susanc/mrnaseq/test -r=/data/susanc/maprseq/sample_output/susanc/mrnaseq/test/run_info.txt"
Command " /usr/local/apps/maprseq/1.2.1/src/sampling.pl -o=/data/susanc/maprseq/sample_output/susanc/mrnaseq/test -r=/data/susanc/maprseq/sample_output/susanc/mrnaseq/test/run_info.txt -s=SAMPLE1"

[2014-07-07 13:16:53] Beginning TopHat run (v2.0.6) [Process: PreProcess]
-----------------------------------------------
[2014-07-07 13:16:53] Checking for Bowtie
      Bowtie version: 0.12.9.0
[2014-07-07 13:16:53] Checking for Samtools
      Samtools version: 0.1.19.0
[2014-07-07 13:16:53] Checking for Bowtie index files
[2014-07-07 13:16:53] Checking for Bowtie index files
[2014-07-07 13:16:53] Checking for reference FASTA file
[2014-07-07 13:16:53] Generating SAM header for /usr/local/apps/maprseq/1.2.1/references/hg19/37.1/indexed/allchr
      format: fastq
      quality scale: phred64 (reads generated with GA pipeline version >= 1.3)
[2014-07-07 13:17:56] Reading known junctions from GTF file
[2014-07-07 13:18:01] Preparing reads
       left reads: min. length=51, max. length=51, 249967 kept reads (33 discarded)
      right reads: min. length=50, max. length=50, 249959 kept reads (41 discarded)
-----------------------------------------------
[2014-07-07 13:18:07] Step PreProcess complete: 00:01:14 elapsed

[2014-07-07 13:18:08] Beginning TopHat run (v2.0.6) [Process: InitialAlign]
-----------------------------------------------
[....etc...]

This run uses about 3 GB of memory.

Documentation

MAP-RSeq website

MAP-RSeq user guide