Biowulf at the NIH
RSS Feed
MapSplice on Helix & Biowulf

Accurate mapping of RNA-seq reads for splice junction discovery.

MapSplice was developed at the U. Kentucky Bioinformatics lab.

biowulf$ module avail mapsplice

-------------------- /usr/local/Modules/3.2.9/modulefiles ---------------------
mapsplice/1.15.2

biowulf$ module load mapsplice

biowulf$ module list
Currently Loaded Modulefiles:
  1) mapsplice/1.15.2

Running on Helix

helix $ module load mapsplice
helix $ cd /data/$USER/dir
helix $ python $MSBIN/mapsplice.py [options] -c  -x  -1  -2 

Submitting a single batch job

1. Create a batch script file similar to the one below:

#!/bin/bash
# This file is YourOwnFileName
#
#PBS -N yourownfilename
#PBS -m be
#PBS -k oe

module load mapsplice
cd /data/$USER/mydir
python $MSBIN/mapsplice.py [options] -c <Reference_Sequence> -x <Bowtie_Index> -1 <Read_List1> -2 <Read_List2>

2. Submit the script using the 'qsub' command on Biowulf, e.g.

[user@biowulf]$ qsub -l nodes=1 /data/$USER/theScriptFileAbove

Submitting a swarm of jobs

Using the 'swarm' utility, one can submit many jobs to the cluster to run concurrently.

Set up a swarm command file (eg /data/username/cmdfile). Here is a sample file:

cd /data/user/mydir1; python $MSBIN/mapsplice.py [options] -c <Reference_Sequence> -x <Bowtie_Index> -1 <Read_List1> -2 <Read_List2>
cd /data/user/mydir2; python $MSBIN/mapsplice.py [options] -c <Reference_Sequence> -x <Bowtie_Index> -1 <Read_List1> -2 <Read_List2>
cd /data/user/mydir3; python $MSBIN/mapsplice.py [options] -c <Reference_Sequence> -x <Bowtie_Index> -1 <Read_List1> -2 <Read_List2>
   [...]   

Submit this job with

swarm -f cmdfile --module mapsplice

By default, each line of the command file above will run on one core of a node using up to 1 GB of memory. The bowtie section of MapSplice can run in multi-threaded mode. If you specify more than 1 thread for Bowtie (using '-p #', or '--threads #'), then you must tell swarm how many threads each command will use by using the '-t #' flag to swarm. For example, if you set '--threads 4', then you should submit swarm with:

swarm -t 4 -f cmdfile --module mapsplice

If each command requires more than 1 GB of memory, you must tell swarm the amount of memory required using the '-g #' flag to swarm. For example, if each mapsplice command (a single line in the file above) requires 10 GB of memory and you are running with 4 threads, you would submit the swarm with:

swarm -g 10 -t 4 -f cmdfile --module mapsplice

For more information regarding running swarm, see swarm.html

 

Running an interactive job

Users may need to run jobs interactively sometimes. Such jobs should not be run on the Biowulf login node. Instead allocate an interactive node as described below, and run the interactive job there.

[user@biowulf] $ qsub -I -l nodes=1
qsub: waiting for job 2236960.biobos to start
qsub: job 2236960.biobos ready

[user@p4]$ cd /data/user/myruns
[user@p4]$ module load mapsplice
[user@p4]$ cd /data/$USER/mapsplice/run1
[user@p4]$ python $MSBIN/mapsplice.py [options] -c <Reference_Sequence> -x <Bowtie_Index> -1 <Read_List1> -2 <Read_List2>
[user@p4] exit
qsub: job 2236960.biobos completed
[user@biowulf]$

You may add a node property in the qsub command to request specific interactive node. For example, if you need a node with 24gb of memory to run job interactively, do this:

[user@biowulf]$ qsub -I -l nodes=1:g24:c16

 

Documentation

http://www.netlab.uky.edu/p/bioinfo/MapSplice2UserGuide