Biowulf at the NIH
RSS Feed
FastMEDUSA

Description

FastMEDUSA is a parallel program to infer gene regulatory networks from gene expression and promoter sequences. It's been developed based on MEDUSA (Kundaje, et al. 2008 Plos Comp. Bio.). When using, please cite Serdar Bozdag, et al., Bioinformatics_, 2010, 26 (14) 1792-1793..

How to Use

In order to use FastMEDUSA, you must set your environment using modules.

module load FastMEDUSA

FastMEDUSA requires MPICH2 and graphviz. Loading the FastMEDUSA module automatically sets your environment.

Absolute paths to the input data must be given. Example input can be found in /usr/local/apps/FastMEDUSA/1.1/data.

Single Node qsub

Here is an example qsub script for submitting to a single node:

#!/bin/bash

# This script is run_single.sh

#PBS -N run_single
#PBS -o run_single.o
#PBS -e run_single.e

cd $PBS_O_WORKDIR
module load FastMEDUSA
mpd &
sleep 3
mpiexec -n $np FastMEDUSA \
 -c /absolute/path/to/target_gene_expression.csv \
 -p /absolute/path/to/regulator_gene_expression.csv \
 -s /absolute/path/to/promoter_seq.fasta \
 -i 400 \
 -D $PBS_O_WORKDIR \
 -r run_single

This script can be run a single node with 16 cores with the following command:

qsub -l nodes=1:c16 -v np=16 run_single.sh

Multiple Node qsub

For large data sets, running on multiple nodes increases the number of processors, and consequently lowers the overall runtime. NOTE: The number of processors must be less than the number of genes!

#!/bin/bash

# This script is run_multi.sh

#PBS -N run_multi
#PBS -o run_multi.o
#PBS -e run_multi.e

cd $PBS_O_WORKDIR
module load FastMEDUSA
mpdboot -f $PBS_NODEFILE -n `cat $PBS_NODEFILE | wc -l` &
sleep 3
mpiexec -n $np FastMEDUSA \
  -c /absolute/path/to/target_gene_expression.csv \
  -p /absolute/path/to/regulator_gene_expression.csv \
  -s /absolute/path/to/promoter_seq.fasta \
  -i 400 \
  -D $PBS_O_WORKDIR \
  -r run_multi
mpdallexit

This script can be run a four c16 nodes for a total of 64 cores with the following command:

qsub -l nodes=4:c16 -v np=64 run_multi.sh

Visualizing the Results

The final results can be visualized with dot, an executable from the graphviz package.

generate_ADT -p run_multi
dot -Tpdf run_multi/run_multi.dot -o run_multi.pdf

Documentation