Biowulf at the NIH
RSS Feed
BEAST on Biowulf

BEAST (Bayesian Evolutionary Analysis Sampling Trees) is a cross-platform program for Bayesian MCMC analysis of molecular sequences. It is entirely orientated towards rooted, time-measured phylogenies inferred using strict or relaxed molecular clock models. It can be used as a method of reconstructing phylogenies but is also a framework for testing evolutionary hypotheses without conditioning on a single tree topology. BEAST uses MCMC to average over tree space, so that each tree is weighted proportional to its posterior probability. We include a simple to use user-interface program for setting up standard analyses and a suit of programs for analysing the results.

BEAST is a single-threaded program. It is only advantageous to run BEAST on Biowulf if you need to run a large number of BEAST jobs simultaneously.

Submitting a swarm of BEAST jobs

Please set your environment using environment modules. When using swarm, this is easily done with the --module option.

Create a swarm command file with a single line for each run. Sample file:

----- this file is beast.swarm------------ beast file1.xml beast file2.xml beast file3.xml beast file4.xml [...]

Submit this swarm of jobs with the command:

swarm -f beast.swarm --module BEAST
BEAST with Beagle-lib on GPUs

BEAGLE is a high-performance library that can perform the core calculations at the heart of most Bayesian and Maximum Likelihood phylogenetics package. It can make use of GPUs.

Note: using '-beagle_single' can lead to underflow and errors for larger data sets. Using '-beagle_double' slows down execution but avoids this problem. (thanks to Kurt Wollenberg, NIAID, for this tip).

BEAST+Beagle only uses one GPU. The Biowulf GPU nodes have 2 GPUs, so the script below is set to run two different instances of BEAST+Beagle to fully utilize the GPUs.

Sample batch script using two of the Beast benchmark sets as input:

#!/bin/bash
#PBS -N beast-beagle

module load BEAST/1.7.5-gpu

cd $PBS_O_WORKDIR
cp /usr/local/apps/BEAST/1.7.5/examples/Benchmarks/benchmark1.xml . 
cp /usr/local/apps/BEAST/1.7.5/examples/Benchmarks/benchmark2.xml .
beast -beagle_double -seed 123456 -beagle_GPU benchmark1.xml > benchmark1.out 2>&1 &
beast -beagle_double -seed 123456 -beagle_GPU benchmark2.xml > benchmark2.out 2>&1 &
wait

Submit this job with:

qsub -l nodes=1:gpu2050 jobscript

You can determine whether the GPU is being used in 2 ways:

  1. Standard output from the job.
                      BEAST v1.7.5, 2002-2013
           Bayesian Evolutionary Analysis Sampling Trees
    [...etc...]
    Using strict molecular clock model.
    Creating state frequencies model 'frequencies': Initial frequencies = {0.25, 0.25, 0.25, 0.25}
    Creating HKY substitution model. Initial kappa = 2.0
    Creating site model.
    Using BEAGLE TreeLikelihood
      Branch rate model used: strictClockBranchRates
      Using BEAGLE resource 1: Tesla M2050
        Global memory (MB): 2687
        Clock speed (Ghz): 1.15
        Number of cores: 448
        with instance flags:  PRECISION_SINGLE COMPUTATION_SYNCH EIGEN_REAL SCALING_MANUAL SCALERS_RAW VECTOR_NONE THREADING_NONE PROCESSOR_GPU
    [...etc...]
    

  2. Use the 'nvidia-smi' command on the allocated GPU node. This will report the actual compute processes on the GPU.
    [susanc@biowulf ~]$ rsh p83 nvidia-smi
    Fri Jun 14 14:06:48 2013       
    +------------------------------------------------------+                       
    | NVIDIA-SMI 3.295.33   Driver Version: 295.33         |                       
    |-------------------------------+----------------------+----------------------+
    | Nb.  Name                     | Bus Id        Disp.  | Volatile ECC SB / DB |
    | Fan   Temp   Power Usage /Cap | Memory Usage         | GPU Util. Compute M. |
    |===============================+======================+======================|
    | 0.  Tesla M2050               | 0000:02:00.0  Off    |         0          0 |
    |  N/A    N/A  P0    N/A /  N/A |   5%  122MB / 2687MB |   40%     Default    |
    |-------------------------------+----------------------+----------------------|
    | 1.  Tesla M2050               | 0000:03:00.0  Off    |         0          0 |
    |  N/A    N/A  P1    N/A /  N/A |   0%    6MB / 2687MB |    0%     Default    |
    |-------------------------------+----------------------+----------------------|
    | Compute processes:                                               GPU Memory |
    |  GPU  PID     Process name                                       Usage      |
    |=============================================================================|
    |  0.  8204     java                                                   114MB  |
    +-----------------------------------------------------------------------------+
    
Documentation

BEAST wiki page