![]() |
|
||||||||||||||||||||
| |
|||||||||||||||||||||
fastDNAml on BiowulffastDNAml computes the likelihood of various phylogenetic trees, starting with aligned DNA sequences from a number of species. It was developed by Gary Olsen and Ross Overbeek. Published work should cite:Olsen, G. J., Matsuda, H., Hagstrom, R., and Overbeek, R. 1994. fastDNAml: A tool for construction of phylogenetic trees of DNA sequences using maximum likelihood. Comput. Appl. Biosci. 10: 41-48. fastDNAml is a program derived from Joseph Felsenstein's version 3.3 DNAML (part of his PHYLIP package). Users should consult the documentation for DNAML before using this program. fastDNAml is an attempt to solve the same problem as DNAML, but to do so faster and using less memory, so that larger trees and/or more bootstrap replicates become tractable. Much of fastDNAml is merely a recoding of the PHYLIP 3.3 DNAML program from PASCAL to C. The input to fastDNAml is a single file in Phylip format. All options to the program are included in this file. See below for a full description of fastDNAml options. The documentation for fastDNAml is in /usr/local/src/fastDNAml/docs. Test data sets are available in /usr/local/src/fastDNAml/testdata
Running fastDNAml on BiowulfThe most complex part of using this program is to set up the input file. For this, you need to read the documentation (no way around this :-)). The documentation is at the end of this page. Also see the fastDNAml website.The program is parallelized with PVM, but a user-friendly script has been set up so that the user does not need to deal with PVM or the Biowulf batch system directly. Once the you have your input file, you simply type fastDNAml on the Biowulf command line and respond to the prompted questions. The script will set the appropriate environment variables, start up the PVM processes and submit your job. Jobs are submitted to 8 nodes (or sometimes fewer if the system is very busy and nodes are unavailable). The output files go into the current directory by default, but you can choose to put them in any directory. Sample session (user input is in bold):
fastDNAml options and Examples
The default input data format is Interleaved (see I option). To help get data
from a GenBank or similar format, the interleaved option can be switched off
with the I option. Numbers in the sequence data (i.e., sequence position
numbers) will be ignored, so they need not be stripped out.
(Note that the program also writes a file called checkpoint.PID. See the R
option below for more description.)
1 -- Print Data
By default, fastDNAml does not echo the sequence data to the output file.
Option 1 reverses this.
3 -- Do Not Print Tree
By default, fastDNAml prints the final tree to the output file. Option 3
reverses this.
4 -- Do Not Write Tree to File (***** Changed in version 1.1 *****)
By default, fastDNAml versions 1.1 and 1.2 write a machine readable (Newick
format) copy of the final tree to an output file. Option 4 reverses this.
The tree output file will be called treefile.PID (where PID is the process ID
under which fastDNAml is running). Look at the Y option below for more
information on alternative tree formats.
B -- Bootstrap
Generates a bootstrap sample of the input data. Requires auxiliary data line
of the form:
B random_number_seed
Example:
5 114 B
B 137
Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
...
If the W option is used, only positions that have nonzero weights are used in
computing the bootstrap sample. Warning: For a given random number seed, the
sample will always be the same.
PHYLIP DNAML does not include a bootstrap option. (Use the SEQBOOT program.)
C -- Categories
Requires auxiliary data of the form:
C number_of_categories list_of_category_rates
The maximum number of categories is 35. This line is followed by a list of
the rates for each site:
Categories list_of_categories [per site, one or more lines]
Category "numbers" are ordered: 1, 2, 3, ..., 9, A, B, ..., Y, Z. Category
zero (undefined rate) is permitted at sites with a zero in a user-supplied
weighting mask.
Example:
5 114 C
C 12 0.0625 0.125 0.25 0.5 1 2 4 8 16 32 64 128
Categories 5111136343678975AAA8949995566778888889AAAAAA9239898629AAAAA9
633792246624457364222574877188898132984963499AA9899975
Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
...
PHYLIP DNAML is limited to categories 1 through 9. Also, in PHYLIP version
3.3, the categories data came after all the other auxiliary data, but before
the user-supplied base frequencies and sequence data. If you make the C line
your last auxiliary data line, the programs will behave the same.
F -- Empirical Frequencies (***** Changed in version 1.1 *****)
By default (starting with version 1.1), the program uses base frequencies
derived from the sequence data (called emperical base frequencies). Therefore
the input file should normally NOT include a base frequencies line preceding
the data. If you want to include your own base freqency data, it is now
necessary to use the F option, and add a line to the input file that supplies
the frequency data:
Instructs the program to use user-supllied base frequencies derived from the
sequence data. Therefore the input file should not include a base frequencies
line IMMEDIATELY preceding the data:
5 114 F
0.25 0.30 0.20 0.25
Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
...
There is an alternative format: the frequencies can be anywhere in the list of
auxilliary data lines if they are preceded by an F in the first column:
5 114 F C W
F 0.25 0.30 0.20 0.25
C ...
...
W ...
Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
...
G -- Global
If the global option is specified, there may also be an [optional] auxiliary
data line of form:
G N1
or
G N1 N2
N1 is the number of branches to cross in rearrangements of the completed tree.
The value of N2 is the number of branches to cross in testing rearrangements
during the sequential addition phase of tree inference.
N1 = 1: local rearrangement (default without G option)
1 < N1 < numsp-3: regional rearrangements (crossing N1 branches)
N1>= numsp-3: global rearrangements (default with G option)
N2 <= N1 the default N2 is 1, local rearrangements.
The G option can also be used to force branch swapping on user trees, that is,
a combination of G and U options.
If the auxiliary line is supplied, it cannot be the last line of auxiliary
data. (It may be necessary to add the T option with an auxiliary data line of
T 2.0
if no other auxiliary data are used.)
Examples:
Do local rearrangements after each addition, and global after last addition:
5 114 G
Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
...
Do local rearrangements after each addition, and regional (crossing 4
branches) after last addition:
5 114 G T
G 4
T 2.0
Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
...
Do no rearrangements after each addition, and local after last addition:
5 114 G T
G 1 0
T 2.0
Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
...
PHYLIP DNAML does not support the auxiliary data line or branch swapping on a
user tree.
I -- Not Interleaved
By default, fastDNAml 1.2 expects data lines for the various sequences in an
interleaved format (as did PHYLIP 3.3 DNAML). The I option reverses the
expected format (to non-interleaved data, in which all the data lines for one
sequence appear before the next sequence begins). This is particularly useful
for editing a GenBank or equivalent format into a valid input file (note that
numbers within the sequence data are ignored, so it is not necessary to remove
them).
If all the data for each sequence are on one line, then the interleaved and
non-interleaved formats are degenerate. (This is the way David Swofford's
PAUP program writes PHYLIP format output files.) The drawback is that many
programs do not handle long lines of text. This includes the vi and EDT text
editors, many electronic mail programs, and some versions of FTP for VAX/VMS
systems.
PHYLIP 3.3 DNAML expects interleaved data, and does not include an I option to
alter this. PHYLIP 3.4 DNAML accepts an I option, but the default format is
reversed.
J -- Jumble
Randomize the sequence addition order. Requires an auxiliary input line of
the form:
J random_number_seed
Example:
5 114 J
J 137
Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
...
Note that fastDNAml explores a very small number of alternative tree
topologies relative to a typical parsimony program. There is a very real
chance that the search procedure will not find the tree topology with the
highest likelihood. Altering the order of taxon addition and comparing the
trees found is a fairly efficient method for testing convergence. Typically,
it would be nice to find the same best tree at least twice (if not three
times), as opposed to simply performing some fixed number of jumbles and
hoping that at least one of them will be the optimum.
K -- Keep multiple best trees (***** New in version 1.1 *****)
The program can keep a list of the best trees that it has found. When the
program is done, it prints a list of these, from best to worst, and prints
a Hasegawa and Kishino type test as to which trees are significantly worse
than the best tree found. When evaluating user-supplied trees, the program
automatically keeps all trees. In other situations, the program keeps only
the best tree that it has found. The K option, and associated auxilliary
data line, can be used to define an alternate number:
Example, to keep the 15 best trees found:
5 114 K
K 15
Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
...
Example, to keep only the one best tree of possibly numerous user-supplied
trees:
5 114 K U
K 1
Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
...
L -- User Lengths
Causes user trees to be read with branch lengths (and it is an error to omit
any of them). Without the L option, branch lengths in user trees are not
required, and are ignored if present.
Example:
5 114 U L
Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
...
(The U is for user tree and the L for user lengths)
O -- Outgroup
Use the specified sequence number for the outgroup. Requires an auxiliary
data line of the form:
O outgroup_number
Example:
5 114 O
O 5
Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
...
This option only affects the way the tree is drawn (and written to the
treefile).
Q -- Quickadd (***** Changed in version 1.1 *****)
The quickadd feature greatly decreases the time in initially placing a new
sequence in the growing tree (but does not change the time required to
subsequently test rearrangements). The overall time savings seems to be about
30%, based on a number of test cases. Its downside, if any, is unknown. This
is now (starting in version 1.1) the default program behavior.
If the analysis is run with a global option of "G 0 0", so that no
rearrangements are permitted, the tree is built very approximately, but very
quickly. This may be of greatest interest if the question is, "Where does
this one new sequence fit into this known tree? The known tree is provided
with the restart option (below).
PHYLIP DNAML does not include anything comparable to the quickadd feature.
The quickadd feature can be turned OFF by adding a Q to the first line of the
input file.
R -- Restart
The R option causes the program to read a user-supplied tree with less than
the full number of taxa as the starting point for sequential addition of the
remaining taxa. Thus, the sequence data must be followed by a valid (Newick
format) tree. (The phylip_tree/2, prolog fact format, is now also supported.)
The restart option can also be used to increase the range of the search for
alternative (better) trees. For example, you can take a tree produced with
only "local" tree rearrangements, and increase the rearrangements to
"regional" or "global" by combining the appropriate global option with the
restart option. If the starting tree was written by fastDNAml, then the
extent of rearrangements is saved with the tree, and will be used as the
starting point for the additional search. If the tree was already globally
optimized, then no additional searching will be performed.
To support the R option, after each taxon is added to the growing tree, and
after each round of rearrangements, the program appends a checkpoint tree to a
file called checkpoint.PID, where PID is the process number of the running
fastDNAml program. The last line of this file needs to be appended to the
input file when the R option is used. (This should not be confused with the U
(user tree) option, which expects a number followed by that number of trees.
No additional taxa are added to user trees.)
The UNIX utility tail can be used to remove the last tree from the checkpoint
file, and the utility cat can be used to append it to the input. For example,
the following script can be used to add a starting tree and the R option to a
data file, and restart fastDNAml:
#! /bin/sh
if test $# -ne 1
then echo "Usage: restart checkpoint_file"
exit
fi
read first_line # first line of data file
echo "$first_line R" # add restart option
cat - # rest of data file
tail -1 $1 # append last tree in checkpoint file
If this shell script is in the file called restart, then one might use the
command:
restart checkpoint.21312 < infile | fastDNAml > new_outfile
^script ^checkpoint tree ^data ^dnaml program ^output_file
If this is too opaque, don't worry about it, or talk with your local unix
wizard. In the mean time, this and other useful shell scripts are provided
with the program.
PHYLIP DNAML does not write checkpoint trees and does not have a restart
option.
T -- Transition/transversion ratio
Use a user-specified ratio of transition to transversion type substitutions.
Without the T option, a value of 2.0 is used. Requires an auxiliary data line
of the form:
T ratio
Example:
5 114 T
T 1.0
Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
...
(Note that a T option with a value of 2.0 does nothing, but it can provide
a last auxiliary data line following optional auxiliary data. See the
examples for G and Y.)
U -- User Tree(s)
Read an input line with the number of user-specified trees, followed by the
specified number of trees. These data immediately follow the sequence data.
The trees must be in Newick format, and terminated with a semicolon. (The
program also accepts a pseudo_newick format, which is a valid prolog fact.)
The tree reader in this program is more powerful than that in PHYLIP 3.3. In
particular, material enclosed in square brackets, [ like this ], is ignored as
comments; taxa names can be wrapped in single quotation marks to support the
inclusion of characters that would otherwise end the name (i.e., '(', ')',
':', ';', '[', ']', ',' and ' '); names of internal nodes are properly
ignored; and exponential notation (such as 1.0E-6) for branch lengths is
supported.
W -- Weights
Read user-specified column weighting information. This option requires
auxiliary data of the form:
Weights list_of_weight_values [per site, one or more lines]
Example:
5 114 W
Weights 111111111111001100000100011111100000000000000110000110000000
111101111111111111111111011100000111001011100000000011
Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
...
It is necessary that the weight values not start before the 11'th character in
the line, or some of them will be lost. Weights from 0 to 35 are indicated by
the series: 0, 1, 2, 3, ..., 9, A, B, ..., Y, Z.
PHYLIP DNAML does not support user weights with values other than 1 or 0.
This limit has been removed in fastDNAml to permit the use of user weights
as a mechanism for representing a bootstrap sample (that is, only the
auxiliary data lines change, not the body of the data file).
Y -- Write Tree (***** Changed in version 1.1 *****)
fastDNAml writes the final tree to an output file called treefile.PID. By
default the tree is in PHYLIP format. The Y option allows turning this off,
or changing the format of the tree.
The Y option by itself toggles the saving of the tree, on or off. If there
is also an auxiliary input line of the form:
Y number
where number can be 1, 2, or 3, the number selects one of three tree output
formats:
1 Newick
2 Prolog
3 PHYLIP (default)
Newick is the tree standard used by PAUP, MacClade, and serveral other
programs. The tree includes a comment about the analysis that the tree is
based upon. fastDNAml uses this comment when it reads a tree. In addition,
the names of the taxa are enclosed in quotation marks. Both of these
features of the file make it incompatible with the PHYLIP package.
PHYLIP is the subset of the Newick tree standard used by programs in the
PHYLIP package. There are no comments and no quotations marks around names.
(If a name includes unusual characters, such as a comma, fastDNAml will put
it in quotation marks, making it a valid tree, but it cannot be read by the
PHYLIP programs.)
The Prolog format very similar to the Newick format, but it is a valid prolog
fact that permits direct loading into some sequence analysis tools that we
use. The structure of the term is:
pseudo_newick([Comment], (Subtree1, Subtree2, Subtree3): Length).
where each subtree is either
(Subtree1,Subtree2): Length
or
Label: Length
The comment is a valid prolog term when && is defined as a unary operator.
Label is a prolog atom (it is a valid Newick label, with single quotation
marks). Length is a number.
Because the Y auxiliary input line is optional, it cannot be the last auxiliary
data line.
Examples. To turn of the saving of the tree,
5 114 Y
Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
...
or, to change the output to the full Newick format,
5 114 Y T
Y 1
T 2.0
Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
...
PHYLIP DNAML does not append the PID (process ID) to the tree file name and
does not support the full Newick standard or the prolog format output.
=============================================================================
Acknowledgements:
The origin and development of fastDNAml as a program to extend the use of
maximum likelihood phylogenetic inference to larger sets of DNA sequences
was encouraged by Carl Woese. Through the development and evolution of the
program, Joseph Felsenstein has been extremely helpful and encouraging.
Numerous users have made suggestions and/or reported program bugs:
Gary Nunn
Tom Schmidt
Ross Overbeek
Hideo Matsuda
Mitchell Sogin
Brenden Rielly
=============================================================================
Examples:
Data file with empirical frequencies (generic analysis) (notice that blank
lines are permitted in the data):
5 114
Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
Sequence2 ACGCGGTGTCGTGTCATGCTACATTATGCTAGACTGCGTCGGATGCTCGTATTGACTGCG
Sequence3 ACGCGGTGCCGTGTNATGCTGCATTATGCTCGACTGCGRCGGATGCTAGTATTGACTGCG
Sequence4 ACGCGCTGCCGTGTCATCCTACACGATGCYAGACAGCGTCAGCTGCTAGTACTGGCTGAG
Sequence5 ACGCGCTGTCGTGTCATACTGCAGGATGCTAGACTGCGTCAGCTGCTAGTACTGGCTGAG
AGCTCGATGATCGGTGACGTAGACTCAGGGGCCATGCCGCGAGTTTGCGATGCG
AGCACGGTGATCAATGACGTAGNCTCAGGRTCCACGCCGTGACTTTGTGATNCG
AGCACGATGACCGATGACGTAGACTGAGGGTCCGTGCCGCGACTTTGTGATGCG
ACCTCGGTGATTGATGACGTAGACTGCGGGTCCATGCCGCGATTTTGCGRTGCG
ACCTCGATGCTCGATGACGTAGACTGCGGGTCCATGCCGTGATTTTGCGATGCG
Data file with empirical frequencies and a random addition order:
5 114 J
J 137
Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
Sequence2 ACGCGGTGTCGTGTCATGCTACATTATGCTAGACTGCGTCGGATGCTCGTATTGACTGCG
Sequence3 ACGCGGTGCCGTGTNATGCTGCATTATGCTCGACTGCGRCGGATGCTAGTATTGACTGCG
Sequence4 ACGCGCTGCCGTGTCATCCTACACGATGCYAGACAGCGTCAGCTGCTAGTACTGGCTGAG
Sequence5 ACGCGCTGTCGTGTCATACTGCAGGATGCTAGACTGCGTCAGCTGCTAGTACTGGCTGAG
AGCTCGATGATCGGTGACGTAGACTCAGGGGCCATGCCGCGAGTTTGCGATGCG
AGCACGGTGATCAATGACGTAGNCTCAGGRTCCACGCCGTGACTTTGTGATNCG
AGCACGATGACCGATGACGTAGACTGAGGGTCCGTGCCGCGACTTTGTGATGCG
ACCTCGGTGATTGATGACGTAGACTGCGGGTCCATGCCGCGATTTTGCGRTGCG
ACCTCGATGCTCGATGACGTAGACTGCGGGTCCATGCCGTGATTTTGCGATGCG
Data file with empirical frequencies and a bootstrap resampling:
5 114 B
B 137
Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
Sequence2 ACGCGGTGTCGTGTCATGCTACATTATGCTAGACTGCGTCGGATGCTCGTATTGACTGCG
Sequence3 ACGCGGTGCCGTGTNATGCTGCATTATGCTCGACTGCGRCGGATGCTAGTATTGACTGCG
Sequence4 ACGCGCTGCCGTGTCATCCTACACGATGCYAGACAGCGTCAGCTGCTAGTACTGGCTGAG
Sequence5 ACGCGCTGTCGTGTCATACTGCAGGATGCTAGACTGCGTCAGCTGCTAGTACTGGCTGAG
AGCTCGATGATCGGTGACGTAGACTCAGGGGCCATGCCGCGAGTTTGCGATGCG
AGCACGGTGATCAATGACGTAGNCTCAGGRTCCACGCCGTGACTTTGTGATNCG
AGCACGATGACCGATGACGTAGACTGAGGGTCCGTGCCGCGACTTTGTGATGCG
ACCTCGGTGATTGATGACGTAGACTGCGGGTCCATGCCGCGATTTTGCGRTGCG
ACCTCGATGCTCGATGACGTAGACTGCGGGTCCATGCCGTGATTTTGCGATGCG
Data with weighting mask and rate categories:
5 114 W C
Weights 111111111111001100000100011111100000000000000110000110000000
111101111111111111111111011100000111001011100000000011
C 10 0.0625 0.125 0.25 0.5 1 2 4 8 16 32
Categories 5111136343678975AAA8949995566778888889AAAAAA9239898629AAAAA9
633792246624457364222574877188898132984963499AA9899975
Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
Sequence2 ACGCGGTGTCGTGTCATGCTACATTATGCTAGACTGCGTCGGATGCTCGTATTGACTGCG
Sequence3 ACGCGGTGCCGTGTNATGCTGCATTATGCTCGACTGCGRCGGATGCTAGTATTGACTGCG
Sequence4 ACGCGCTGCCGTGTCATCCTACACGATGCYAGACAGCGTCAGCTGCTAGTACTGGCTGAG
Sequence5 ACGCGCTGTCGTGTCATACTGCAGGATGCTAGACTGCGTCAGCTGCTAGTACTGGCTGAG
AGCTCGATGATCGGTGACGTAGACTCAGGGGCCATGCCGCGAGTTTGCGATGCG
AGCACGGTGATCAATGACGTAGNCTCAGGRTCCACGCCGTGACTTTGTGATNCG
AGCACGATGACCGATGACGTAGACTGAGGGTCCGTGCCGCGACTTTGTGATGCG
ACCTCGGTGATTGATGACGTAGACTGCGGGTCCATGCCGCGATTTTGCGRTGCG
ACCTCGATGCTCGATGACGTAGACTGCGGGTCCATGCCGTGATTTTGCGATGCG
Data with three user-specified tree branching orders:
5 114 U
Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
Sequence2 ACGCGGTGTCGTGTCATGCTACATTATGCTAGACTGCGTCGGATGCTCGTATTGACTGCG
Sequence3 ACGCGGTGCCGTGTNATGCTGCATTATGCTCGACTGCGRCGGATGCTAGTATTGACTGCG
Sequence4 ACGCGCTGCCGTGTCATCCTACACGATGCYAGACAGCGTCAGCTGCTAGTACTGGCTGAG
Sequence5 ACGCGCTGTCGTGTCATACTGCAGGATGCTAGACTGCGTCAGCTGCTAGTACTGGCTGAG
AGCTCGATGATCGGTGACGTAGACTCAGGGGCCATGCCGCGAGTTTGCGATGCG
AGCACGGTGATCAATGACGTAGNCTCAGGRTCCACGCCGTGACTTTGTGATNCG
AGCACGATGACCGATGACGTAGACTGAGGGTCCGTGCCGCGACTTTGTGATGCG
ACCTCGGTGATTGATGACGTAGACTGCGGGTCCATGCCGCGATTTTGCGRTGCG
ACCTCGATGCTCGATGACGTAGACTGCGGGTCCATGCCGTGATTTTGCGATGCG
3
(Sequence1,(Sequence2,Sequence3),(Sequence4,Sequence5));
(Sequence2,(Sequence1,Sequence3),(Sequence4,Sequence5));
(Sequence3,(Sequence1,Sequence2),(Sequence4,Sequence5));
Data with transition/transversion ratio and base frequencies to
simulate Jukes & Cantor model:
5 114 T F
T 0.501
F 0.25 0.25 0.25 0.25
Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
Sequence2 ACGCGGTGTCGTGTCATGCTACATTATGCTAGACTGCGTCGGATGCTCGTATTGACTGCG
Sequence3 ACGCGGTGCCGTGTNATGCTGCATTATGCTCGACTGCGRCGGATGCTAGTATTGACTGCG
Sequence4 ACGCGCTGCCGTGTCATCCTACACGATGCYAGACAGCGTCAGCTGCTAGTACTGGCTGAG
Sequence5 ACGCGCTGTCGTGTCATACTGCAGGATGCTAGACTGCGTCAGCTGCTAGTACTGGCTGAG
AGCTCGATGATCGGTGACGTAGACTCAGGGGCCATGCCGCGAGTTTGCGATGCG
AGCACGGTGATCAATGACGTAGNCTCAGGRTCCACGCCGTGACTTTGTGATNCG
AGCACGATGACCGATGACGTAGACTGAGGGTCCGTGCCGCGACTTTGTGATGCG
ACCTCGGTGATTGATGACGTAGACTGCGGGTCCATGCCGCGATTTTGCGRTGCG
ACCTCGATGCTCGATGACGTAGACTGCGGGTCCATGCCGTGATTTTGCGATGCG
Non-interleaved data:
5 114 I
Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
AGCTCGATGATCGGTGACGTAGACTCAGGGGCCATGCCGCGAGTTTGCGATGCG
Sequence2 ACGCGGTGTCGTGTCATGCTACATTATGCTAGACTGCGTCGGATGCTCGTATTGACTGCG
AGCACGGTGATCAATGACGTAGNCTCAGGRTCCACGCCGTGACTTTGTGATNCG
Sequence3 ACGCGGTGCCGTGTNATGCTGCATTATGCTCGACTGCGRCGGATGCTAGTATTGACTGCG
AGCACGATGACCGATGACGTAGACTGAGGGTCCGTGCCGCGACTTTGTGATGCG
Sequence4 ACGCGCTGCCGTGTCATCCTACACGATGCYAGACAGCGTCAGCTGCTAGTACTGGCTGAG
ACCTCGGTGATTGATGACGTAGACTGCGGGTCCATGCCGCGATTTTGCGRTGCG
Sequence5 ACGCGCTGTCGTGTCATACTGCAGGATGCTAGACTGCGTCAGCTGCTAGTACTGGCTGAG
ACCTCGATGCTCGATGACGTAGACTGCGGGTCCATGCCGTGATTTTGCGATGCG
Non-interleaved data by editing a GenBank format (make sure that the names are
padded to at least ten characters with blanks):
5 114 I
Sequence1
1 ACACGGTGTC GTATCATGCT GCAGGATGCT AGACTGCGTC ANATGTTCGT ACTAACTGTG
61 AGCTCGATGA TCGGTGACGT AGACTCAGGG GCCATGCCGC GAGTTTGCGA TGCG
Sequence2
1 ACGCGGTGTC GTGTCATGCT ACATTATGCT AGACTGCGTC GGATGCTCGT ATTGACTGCG
61 AGCACGGTGA TCAATGACGT AGNCTCAGGR TCCACGCCGT GACTTTGTGA TNCG
Sequence3
1 ACGCGGTGCC GTGTNATGCT GCATTATGCT CGACTGCGRC GGATGCTAGT ATTGACTGCG
61 AGCACGATGA CCGATGACGT AGACTGAGGG TCCGTGCCGC GACTTTGTGA TGCG
Sequence4
1 ACGCGCTGCC GTGTCATCCT ACACGATGCY AGACAGCGTC AGCTGCTAGT ACTGGCTGAG
61 ACCTCGGTGA TTGATGACGT AGACTGCGGG TCCATGCCGC GATTTTGCGR TGCG
Sequence5
1 ACGCGCTGTC GTGTCATACT GCAGGATGCT AGACTGCGTC AGCTGCTAGT ACTGGCTGAG
61 ACCTCGATGC TCGATGACGT AGACTGCGGG TCCATGCCGT GATTTTGCGA TGCG
Data analysis restarted from a four-taxon tree (which happens to be wrong,
but it will be corrected by local rearrangements after the tree is read):
5 114 R
Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
Sequence2 ACGCGGTGTCGTGTCATGCTACATTATGCTAGACTGCGTCGGATGCTCGTATTGACTGCG
Sequence3 ACGCGGTGCCGTGTNATGCTGCATTATGCTCGACTGCGRCGGATGCTAGTATTGACTGCG
Sequence4 ACGCGCTGCCGTGTCATCCTACACGATGCYAGACAGCGTCAGCTGCTAGTACTGGCTGAG
Sequence5 ACGCGCTGTCGTGTCATACTGCAGGATGCTAGACTGCGTCAGCTGCTAGTACTGGCTGAG
AGCTCGATGATCGGTGACGTAGACTCAGGGGCCATGCCGCGAGTTTGCGATGCG
AGCACGGTGATCAATGACGTAGNCTCAGGRTCCACGCCGTGACTTTGTGATNCG
AGCACGATGACCGATGACGTAGACTGAGGGTCCGTGCCGCGACTTTGTGATGCG
ACCTCGGTGATTGATGACGTAGACTGCGGGTCCATGCCGCGATTTTGCGRTGCG
ACCTCGATGCTCGATGACGTAGACTGCGGGTCCATGCCGTGATTTTGCGATGCG
(Sequence4:0.1,Sequence2:0.1,(Sequence1:0.1,Sequence5:0.1):0.1):0.0;
Data analysis restarted from a four-taxon tree (which is wrong, and which
will not be corrected after the tree is read due to the suppression of all
rearrangements by the global 0 0 option):
5 114 R G T
G 0 0
T 2.0
Sequence1 ACACGGTGTCGTATCATGCTGCAGGATGCTAGACTGCGTCANATGTTCGTACTAACTGTG
Sequence2 ACGCGGTGTCGTGTCATGCTACATTATGCTAGACTGCGTCGGATGCTCGTATTGACTGCG
Sequence3 ACGCGGTGCCGTGTNATGCTGCATTATGCTCGACTGCGRCGGATGCTAGTATTGACTGCG
Sequence4 ACGCGCTGCCGTGTCATCCTACACGATGCYAGACAGCGTCAGCTGCTAGTACTGGCTGAG
Sequence5 ACGCGCTGTCGTGTCATACTGCAGGATGCTAGACTGCGTCAGCTGCTAGTACTGGCTGAG
AGCTCGATGATCGGTGACGTAGACTCAGGGGCCATGCCGCGAGTTTGCGATGCG
AGCACGGTGATCAATGACGTAGNCTCAGGRTCCACGCCGTGACTTTGTGATNCG
AGCACGATGACCGATGACGTAGACTGAGGGTCCGTGCCGCGACTTTGTGATGCG
ACCTCGGTGATTGATGACGTAGACTGCGGGTCCATGCCGCGATTTTGCGRTGCG
ACCTCGATGCTCGATGACGTAGACTGCGGGTCCATGCCGTGATTTTGCGATGCG
(Sequence4:0.1,Sequence2:0.1,(Sequence1:0.1,Sequence5:0.1):0.1):0.0;
|
|||||||||||||||||||||
| This
document is available as http://biowulf.nih.gov/apps/fastDNA.html Biowulf home page | Helix Systems | NIH |
|||||||||||||||||||||