Biowulf at the NIH
RSS Feed
Meme & Mast on Biowulf
meme_mast Meme is designed to discover motifs (highly conserved regions) in groups of related DNA or protein sequences, and Mast will search sequence databases using motifs. Meme & Mast were developed at UCSD and Purdue. Meme/Mast website.

Meme is cpu-intensive for large numbers of sequences or long sequences. Short jobs are most easily run on Helix, but if larger datasets are used, a parallel run on Biowulf is appropriate.

How to run Meme on Biowulf

Before running Meme, you will need to load the Meme/Mast environment with 'module load meme'. This command will always load the latest installed version of Meme. To see what versions are available, or to load a particular version, use the 'module' commands as shown below. (More about environment modules)

[user@biowulf ~]$ module avail meme

------------------------- /usr/local/Modules/3.2.9/modulefiles ------------------------
meme/4.6.1      meme/4.7.0      meme/4.8.1      meme/4.9.0

[user@biowulf ~]$ module load meme/4.7.0

[user@biowulf ~]$ module list
Currently Loaded Modulefiles:
  1) meme/4.7.0

The 'module load' command will set up the appropriate MPICH, MPICH2 or OpenMPI path that the Meme executable was built with.

Your input database should consist of a file containing sequences in fasta format. In the example below, the file is 'mini-drosoph.s'.

Maxsize parameter: The maximum dataset size in characters. Determine the number of characters in your dataset by typing 'wc -c filename'. e.g.

[user@biowulf mydir]$ wc -c mini-drosoph.s 
506016 mini-drosoph.s
For this dataset, the maxsize parameter has to be set to greater than 506,016, so we will use 600000.

Important cautionary note: Please check your meme parameters and input file sizes before submitting jobs. Very large input file sizes are known to cause problems, and may crash the job and hang the allocated nodes. See forum discussion.

Set up a batch script along the lines of the ones below:

Batch script

Create a batch script along the following lines:

----  this file is called meme.batch ---------
#!/bin/bash
#PBS -N Meme
#PBS -m be
#PBS -j oe

module load meme/4.9.0

cd /data/username/mydir

`which mpirun` -machinefile $PBS_NODEFILE -np $np `which meme_p` mini-drosoph.s \
  -oc meme_out -maxsize 600000 -p $np

Submit this job with a command along the lines of

qsub -v np=64 -l nodes=4:e2666 scriptname
This command will submit the Meme run to 64 processors on 4 e2666 nodes (16 cores each).

Meme scales well, and large meme jobs (maxsize ~500,000) can be submitted on up to 128 processors.

The standard output and standard error from the job will appear in the files Meme.oJobNum and Meme.eJobNum. If the job does not appear to be running correctly, check these files for errors.

Documentation
  1. Type 'meme' or 'mast' with no parameters on the command line to see a list of all available options and more information.
  2. Meme documentation at the SDSC website.
  3. Mast documentation at the SDSC website.