Biowulf at the NIH
Meme & Mast on Biowulf
meme_mast Meme is designed to discover motifs (highly conserved regions) in groups of related DNA or protein sequences, and Mast will search sequence databases using motifs. Meme & Mast were developed at UCSD and Purdue. Meme/Mast website.

Meme is cpu-intensive for large numbers of sequences or long sequences. Short jobs are most easily run on Helix, but if larger datasets are used, a parallel run on Biowulf is appropriate.

How to run Meme on Biowulf

Your input database should consist of a file containing sequences in fasta format. In the example below, the file is 'mini-drosoph.s'.

Maxsize parameter: The maximum dataset size in characters. Determine the number of characters in your dataset by typing 'wc -c filename'. e.g.

[user@biowulf mydir]$ wc -c mini-drosoph.s 
506016 mini-drosoph.s
For this dataset, the maxsize parameter has to be set to greater than 506,016, so we will use 600000.

Set up a batch script along the lines of the one below:

>------- this file is called meme.batch-----------------
#!/bin/csh
#PBS -N Meme
#PBS -m be
#PBS -j oe

setenv PATH /usr/local/mpich/bin:$PATH

cd /data/user/mydir/
mpirun -machinefile $PBS_NODEFILE -np $np /usr/local/meme/bin/meme_p \
     /data/user/mydir/mini-drosoph.s -oc /data/user/mydir/meme_out \
     -maxsize 600000 -p $np
mast /data/user/mydir/meme.txt -text
Submit this script using
qsub -v np=32 -l nodes=16 meme.batch

Meme scales well, and large meme jobs (maxsize ~500,000) can be submitted on up to 128 processors.

The standard output and standard error from the job will appear in the files Meme.oJobNum and Meme.eJobNum. If the job does not appear to be running correctly, check these files for errors.

Documentation
  1. Type 'meme' or 'mast' with no parameters on the command line to see a list of all available options and more information.
  2. Meme documentation at the SDSC website.
  3. Mast documentation at the SDSC website.