Biowulf at the NIH
RSS Feed
MISO on Helix & Biowulf

MISO (Mixture-of-Isoforms) is a probabilistic framework that quantitates the expression level of alternatively spliced genes from RNA-Seq data, and identifies differentially regulated isoforms or exons across samples. By modeling the generative process by which reads are produced from isoforms in RNA-Seq, the MISO model uses Bayesian inference to compute the probability that a read originated from a particular isoform.

The MISO framework is described in Katz et. al., Analysis and design of RNA sequencing experiments for identifying isoform regulationNature Methods (2010).

Running on Helix
$ module load miso  $ cd /data/$USER/miso
$ miso --run ./indexed ./accepted_hits.bam \ --output-dir miso_out \ --read-len 76

Submitting a single batch job on Biowulf

1. Create a script file. Sample batch script file

#!/bin/bash
# This file is Script
#
#PBS -N miso
#PBS -m be
#PBS -k oe

module load miso
cd /data/$USER/miso
miso --run ./indexed ./accepted_hits.bam --output-dir miso_out --read-len 76

2. Submit the script using the 'qsub' command on Biowulf, with, for example:

$ qsub -l nodes=1 ./script

User may need to run a few test jobs to determine the amount of memory required then detemine the node type suitable for your job.

Running an interactive job on Biowulf

Users may need to run jobs interactively sometimes. Such jobs should not be run on the Biowulf login node.

Allocate an interactive node as described below, and run the interactive job there. Alternatively, run interactively on Helix.

biowulf % qsub -I -l nodes=1
qsub: waiting for job 2236960.biobos to start
      qsub: job 2236960.biobos ready

$ module load miso  
$ cd /data/$USER/miso
$ miso --run ./indexed ./accepted_hits.bam --output-dir miso_out --read-len 76

Submitting a swarm of jobs

Using the 'swarm' utility, one can submit many jobs to the cluster to run concurrently.

Set up a swarm command file (e.g. /data/$USER/cmdfile). Here is a sample file. Please note each command is one single line. Do not add any line breaks in one command. Also note that each jobs runs in its own subdirectory.

cd /data/$USER/run1; module load miso; miso --run ./indexed ./accepted_hits.bam --output-dir miso_out --read-len 76
cd /data/$USER/run2; module load miso; miso --run ./indexed ./accepted_hits.bam --output-dir miso_out --read-len 76
cd /data/$USER/run3; module load miso; miso --run ./indexed ./accepted_hits.bam --output-dir miso_out --read-len 76
....
....

Swarm requires one flag: -f, and users will probably want to specify -t, -g, and --module

-f: the swarm command file name above (required)
-t: number of processors per node to use for each line of the commands in the swarm file above. It is using 4 cores by default setting.
-g: GB of memory needed for each line of the commands in the swarm file above.(optional)

You need to tell swarm how many cores to use for each command. It's 1 core each command by default. This is done with the -t switch to swarm. In addition, each command may require, say, 12 GB of memory. This is specified to swarm using the -g 12 switch. Thus, this swarm command file can be submitted with:

biowulf> $ swarm -g 12 -t 4 -f cmdfile
Users may need to run a few test jobs to determine how much memory is used. Set up a single job, then submit it. The output from the job will list the memory used by that job.

For more information regarding running swarm, see swarm.html

 

Submitting a parallel job

One miso job can also be splited into several smaller subjobs and each run on one node.

biowulf $ module load miso
biowulf $ cd /data/$USER/miso
biowulf $ miso --run ./indexed ./accepted_hits.bam \
               --output-dir outdir \
               --read-len 76 \
               --settings-filename=/usr/local/apps/miso/miso_settings_cluster.txt \
               --use-cluster 
               --chunk-jobs=1000

The main miso command will be splitted and qsub to several different node. Only one core in each node will be used.
The node setting in the miso_settings_cluster.txt is g8 nodes.
User can watch job load by running 'jobload -wc YourUserName' and determine if other type of node is needed.
If so, copy the miso_setting_cluster.txt to user's own area and modify it, then change the full path of the file accordingly.
If user wants to change the number of events in each chunk, for example, --chunk-jobs=100, then more qsub jobs will be submitted and each one will run faster since less events are included.

 

Documentation

http://genes.mit.edu/burgelab/miso/docs/