Meme & Mast on Biowulf
Meme is designed to discover motifs (highly
conserved regions) in groups of related DNA or protein sequences, and Mast will
search sequence databases using motifs. Meme & Mast were developed at UCSD
and Purdue. Meme/Mast website.
Meme is cpu-intensive for large numbers of sequences or long sequences. Short jobs are most easily run on Helix, but if larger datasets are used, a parallel run on Biowulf is appropriate.
How to run Meme on Biowulf
Your input database should consist of a file containing sequences in fasta format. In the example below, the file is 'mini-drosoph.s'.Maxsize parameter: The maximum dataset size in characters. Determine the number of characters in your dataset by typing 'wc -c filename'. e.g.
[user@biowulf mydir]$ wc -c mini-drosoph.s 506016 mini-drosoph.s
Set up a batch script along the lines of the one below:
>------- this file is called meme.batch-----------------
Submit this script using
#!/bin/csh
#PBS -N Meme
#PBS -m be
#PBS -j oe
setenv PATH /usr/local/mpich/bin:$PATH
cd /data/user/mydir/
mpirun -machinefile $PBS_NODEFILE -np $np /usr/local/meme/bin/meme_p \
/data/user/mydir/mini-drosoph.s -oc /data/user/mydir/meme_out \
-maxsize 600000 -p $np
mast /data/user/mydir/meme.txt -text
qsub -v np=32 -l nodes=16 meme.batch
Meme scales well, and large meme jobs (maxsize ~500,000) can be submitted on up to 128 processors.
The standard output and standard error from the job will appear in the files Meme.oJobNum and Meme.eJobNum. If the job does not appear to be running correctly, check these files for errors.
Documentation
- Type 'meme' or 'mast' with no parameters on the command line to see a list of all available options and more information.
- Meme documentation at the SDSC website.
- Mast documentation at the SDSC website.


