TREE-PUZZLE is
a computer program to reconstruct phylogenetic trees from molecular sequence
data by maximum likelihood. It implements a fast tree search algorithm, quartet
puzzling, that allows analysis of large data sets and automatically assigns
estimations of support to each internal branch. TREE-PUZZLE also computes
pairwise maximum likelihood distances as well as branch lengths for user
specified trees. Branch lengths can also be calculated under the
clock-assumption. In addition, TREE-PUZZLE offers likelihood mapping, a method
to investigate the support of a hypothesized internal branch without computing
an overall tree and to visualize the phylogenetic content of a sequence
alignment. TREE-PUZZLE also conducts a number of statistical tests on the data
set (chi-square test for homogeneity of base composition, likelihood ratio to
test the clock hypothesis, Kishino-Hasegawa test). The models of substitution
provided by TREE-PUZZLE are TN, HKY, F84, SH for nucleotides, Dayhoff, JTT,
mtREV24, BLOSUM 62, VT, WAG for amino acids, and F81 for two-state data. Rate
heterogeneity is modeled by a discrete Gamma distribution and by allowing
invariable sites. The corresponding parameters can be inferred from the data
set.
Tree-Puzzle Documentation (PDF)
Tree-Puzzle website
Tree-Puzzle on Biowulf has been built with MPI for parallel runs. To submit a job on Biowulf, create a command file similar to the following:
-------------------Sample command file for Tree-Puzzle----------------------- #!/bin/csh #PBS -N Ppuzzle #PBS -m be #PBS -k oe set path = (/usr/local/mpich/bin $path) cd /data/username/tree/ mpirun -machinefile $PBS_NODEFILE -np $np /usr/local/bin/ppuzzle << EOF EF.phy n 1000000 e b y EOF -----------------------------------------------------------------------------
Submit this job using the qsub command, e.g:
qsub -v np=5 -l nodes=2 command-filewhere 'command-file' is the file you created above.
Note: although there are only 4 processors on the allocated nodes, the master process of Tree-Puzzle takes a very small amount of cpu time. Therefore it is acceptable to 'overload' the nodes by running on N+1 (5, in this case) processors.
Tree-Puzzle has many options. A summary is below:
GENERAL OPTIONS b Type of analysis? Tree reconstruction k Tree search procedure? Quartet puzzling v Approximate quartet likelihood? No u List unresolved quartets? No n Number of puzzling steps? 1000 j List puzzling step trees? No o Display as outgroup? Gibbon z Compute clocklike branch lengths? No e Parameter estimates? Approximate (faster) x Parameter estimation uses? Neighbor-joining tree SUBSTITUTION PROCESS d Type of sequence input data? Nucleotides m Model of substitution? HKY (Hasegawa et al. 1985) t Transition/transversion parameter? Estimate from data set f Nucleotide frequencies? Estimate from data set RATE HETEROGENEITY w Model of rate heterogeneity? Uniform rateOptions are specified in the command file by simply entering the interactive menu options and values as needed. For example, to change the number of puzzling steps in your run to 8000, the command file would look like
-------------------------------------------------------- #!/bin/csh #PBS -N Ppuzzle #PBS -m be #PBS -k oe set path = (/usr/local/mpich/bin $path) cd /data/username/tree/ mpirun -machinefile $PBS_NODEFILE -np $np /usr/local/bin/ppuzzle << EOF primates.b n 8000 y EOF ----------------------------------------------------


