MrBayes is a program for the Bayesian estimation of phylogeny. Bayesian inference of phylogeny is based upon a quantity called the posterior probability distribution of trees, which is the probability of a tree conditioned on the observations. The conditioning is accomplished using Bayes's theorem. The posterior probability distribution of trees is impossible to calculate analytically; instead, MrBayes uses a simulation technique called Markov chain Monte Carlo (or MCMC) to approximate the posterior probabilities of trees.
The program takes as input a character matrix in a NEXUS file format. The output is several files with the parameters that were sampled by the MCMC algorithm. MrBayes can summarize the information in these files for the user. The program features include:
- Extensive help available via the command line;
- Ability to analyze nucleotide, amino acid, restriction site, and morphological data;
- Mixing of data types, such as molecular and morphological characters, in a single analysis;
- A general method for assigning parameters across data partitions;
- An abundance of evolutionary models, including 4 X 4, doublet, and codon models for nucleotide data and many of the standard rate matrices for amino acid data;
- Estimation of positively selected sites in a fully hierarchical Bayes framework;
- Distributed computing using MPI
1. Create a script file which contains the MrBayes commands as below
(this file is /data/user/MYDIR/test.sh -- this must be changed by the user. Also, please change /data/user/MYDIR to the same directory holding test.nex):
#!/bin/bash # #PBS -N MrBayes #PBS -m be #PBS -k oe module load mrbayes cd /data/user/MYDIR `which mpirun` -machinefile $PBS_NODEFILE -np $np `which mb` test.nex
2. Now submit the script using the 'qsub' command and the -v option, e.g.
qsub -v np=8 -l nodes=4:o2800 /data/user/MYDIR/test.sh
Where np is the desired number of processors (2x the number of nodes, 4x for dual-core nodes) nodes is the desired number of nodes (in this case, 4) o2800 is the desired type of processor "test" is the name of the script file.
MrBayes is parallelized, and uses MPI to distribute heated and cold chains among available processors. When run in parallel, each chain is done by a single processor. Thus, MrBayes cannot use more processors than there are chains. If you submit your MrBayes job to more processors than you have chains, you will see the error message:
> " The number of chains must be at least as great > as the number of processors (#)
It is possible to increase the number of chains (nchains) or the number of independent runs (nruns), and then submit to more processors. Increasing the 'nruns' parameter and running on more processors will not speed up the calculation, since each independent run will still take the same amount of time to compute. However, it will allow you to have more independent runs evaluated at the same time, and therefore get a better result.
To run a large number of MrBayes jobs, and have each job use multiple processors,
Set up a swarm command file along the following lines (this file is swarm.cmd):
cd /data/user/myjob/a1 ; `which mpirun` -machinefile $PBS_NODEFILE -np 4 `which mb` test1.nex cd /data/user/myjob/a2 ; `which mpirun` -machinefile $PBS_NODEFILE -np 4 `which mb` test2.nex cd /data/user/myjob/a3 ; `which mpirun` -machinefile $PBS_NODEFILE -np 4 `which mb` test3.nex
In the example above, a different directory is being used for each run for convenience. Each MrBayes run is set up to use 4 processors ('-np 4'). Thus, the swarm command must also be set up so that each MrBayes run is allocated 4 processors..
swarm -t 4 -f swarm.cmd --module mrbayes
The '-t 4' flag tells swarm that each command requires 4 cores. Note that swarm will not submit a single command to multiple nodes.
If you have a small MrBayes job, it is probably easiest to run on Helix. Occasionally, for debugging purposes, an interactive job may be run on Biowulf by allocating an interactive node. Please remember to exit from the node when done.
<biowulf>% qsub -I -l nodes=1
qsub: waiting for job 593807.biobos to start
qsub: job 593807.biobos ready <p2>% module load mrbayes <p2>%mrbayes MrBayes v3.1.2 (Bayesian Analysis of Phylogeny) (Parallel version) (1 processors available) by John P. Huelsenbeck and Fredrik Ronquist Section of Ecology, Behavior and Evolution Division of Biological Sciences University of California, San Diego email@example.com School of Computational Science Florida State University firstname.lastname@example.org Distributed under the GNU General Public License Type "help" or "help <command>" for information on the commands that are available. MrBayes > execute /usr/local/bench/mrbayes/arch107_L1000.nex Executing file "/usr/local/bench/mrbayes/arch107_L1000.nex" UNIX line termination Longest line length = 1011 Parsing file Expecting NEXUS formatted file Reading data block Allocated matrix Matrix has 107 taxa and 1000 characters Missing data coded as ? Gaps coded as - Data is Dna Setting default partition (does not divide up characters). Taxon 1 -> Har.maris2 Taxon 2 -> Har.maris1 Taxon 3 -> Har.mukoht Taxon 4 -> Ntm.pharao Taxon 5 -> AB012057 Taxon 6 -> AB012052 Taxon 7 -> AB012054 Taxon 8 -> Hc.salifo2 Taxon 9 -> Hb.cutirub Taxon 10 -> AF071880 [....] Taxon 101 -> U81774 Taxon 102 -> AB019720 Taxon 103 -> AF068822 Taxon 104 -> AB019721 Taxon 105 -> AB019719 Taxon 106 -> AB019715 Taxon 107 -> AB019717 Setting output file names to "/usr/local/bench/mrbayes/arch107_L1000.nex.run<i>.<p/t>" Successfully read matrix Exiting data block Reached end of file MrBayes >quit Deleting matrix Quitting program <p2>%exit <biowulf>%