Biowulf at the NIH
RSS Feed
PAML on Biowulf
PAML is a package of programs for phylogenetic analyses of DNA or protein sequences using maximum likelihood. It is maintained and distributed for academic use free of charge by Ziheng Yang. [PAML website]

Note that the PAML programs are single-threaded. The only advantage of using Biowulf for PAML jobs is if you can utilize multiple processors by running many simultaneous PAML jobs.

Running a PAML job on Biowulf

There are several example input files in the directory /usr/local/apps/paml/paml4.7/examples. The scripts below use the MouseLemurs data files from that directory.

First create the command files for the PAML programs you intend to run. (e.g. baseml.ctl, codeml.ctl). Then create a batch script along the following lines:

--------  this file is called run.bat -----------------------
#!/bin/bash
#PBS -N PAMLrun
#PBS -m be

cd $PBS_O_WORKDIR
baseml
codeml

Submit this job with:

qsub -l nodes=1 run.bat

If this job requires more than the default 1 GB of memory, you should specify a node type with more memory on the qsub command line. (use 'freen' to see available node types). For example, if your job required 10 GB of memory, you would submit to a g24 node (24 GB of memory) with:

qsub -l nodes=1:g24 run.bat

Submitting a swarm of PAML jobs

PAML is single-threaded, so the advantage of running PAML jobs on Biowulf is that you can run a large number of them simultaneously. The easiest way to do this is via swarm. Set up a swarm command file like the following:

cd /data/$USER/paml/set1; baseml; codeml
cd /data/$USER/paml/set2; baseml; codeml
cd /data/$USER/paml/set3; baseml; codeml

Submit this swarm with:

swarm -g 5 -f swarmfile --module paml/4.7

where '-g 5' tells swarm that each command (one line in the file above) requires 5 GB of memory.

Documentation

PAML User Guide (PDF)

PAML FAQ (PDF)