|
Applications
on Biowulf
[Sequence
Analysis]
[Phylogenetics/Linkage]
[Computational Chemistry/Molecular Modeling]
[Proteomics/Mass Spectrometry]
[Mathematics/Statistics]
[Image Analysis]
[Structural Biology]
[General]
[Utilities]
Sequence
Analysis
BLAST
on Biowulf
BLAST, developed at NCBI, is a set of programs to find similarity between
a query protein or DNA sequence and a sequence database. A scheme for
efficiently running a large number of sequence files against a variety
of BLAST databases has been implemented on Biowulf.
EMBOSS
EMBOSS package is a comprehensive suite of sequence analysis software
that can perform sequence alignment, motif identification, pattern
analysis, and more.
WU-Blast
WU-BLAST, developed at Washington University, is fast, gapped Blast with
statistics, intended to find similarity between a query protein or DNA
sequence and a sequence database.
BLAT
BLAT is a DNA/Protein Sequence Analysis program written by Jim Kent
at UCSC. It is designed to quickly find sequences of 95% and greater
similarity of length 40 bases or more. It may miss more divergent or
shorter sequence alignments. It will find perfect sequence matches of
33 bases, and sometimes find them down to 22 bases. BLAT on proteins
finds sequences of 80% and greater similarity of length 20 amino acids
or more. In practice DNA BLAT works well on primates, and protein blat
on land vertebrates. See the documentation for
details on how to run Blat on Biowulf.
FASTA
The fasta program package contains many programs for searching
DNA and protein databases and one program (prss) for evaluating statistical
significance from randomly shuffled sequences.
Meme & Mast
Meme is designed to discover motifs (highly conserved regions) in groups of related DNA or
protein sequences, and Mast will search sequence databases using motifs.
HMMER
Profile hidden Markov models (profile HMMs) can be used to do
sensitive database searching using statistical descriptions of a sequence
family's consensus. HMMER uses profile HMMs for several types of homology
searches.
RepeatMasker
RepeatMasker is a program that screens DNA sequences for interspersed
repeats and low complexity DNA sequences. The output of the program is
a detailed annotation of the repeats that are present in the query sequence
as well as a modified version of the query sequence in which all the annotated
repeats have been masked (default: replaced by Ns). On average, almost
50% of a human genomic DNA sequence currently will be masked by the program.
Jim
Kent Library jksrc454.zip
A collection of executables from
Jim Kent have been compiled on Biobos. The programs perform a multitude
of tasks from simple number crunching to highly specific sequence analysis
and database construction. The executables are located in the directory
/usr/local/ucsc on biowulf.
Scientific
databases
A list of all available nucleotide, protein, structural, and otherc databases available on the
system for Blast, WU-Blast, Fasta etc., and their update status.
Phylogenetic/Linkage
Analysis
Solar
a package of software to perform several kinds of statistical genetic
analysis, including linkage analysis, quantitative genetic analysis, and
covariate screening. The name SOLAR stands for "Sequential Oligogenic
Linkage Analysis Routines."
SimWalk
is a statistical genetics computer application for haplotype, parametric
linkage, non-parametric linkage (NPL), identity by descent (IBD) and mistyping
analyses on any size of pedigree. SimWalk2 uses Markov chain Monte Carlo
(MCMC) and simulated annealing algorithms to perform these multipoint
analyses.
Tree-Puzzle
TREE-PUZZLE is a computer program to reconstruct phylogenetic
trees from molecular sequence data by maximum likelihood. It implements
a fast tree search algorithm, quartet puzzling, that allows analysis
of large data sets and automatically assigns estimations of support
to each internal branch.
Merlin
MERLIN uses sparse trees to represent gene flow in pedigrees
and is one of the fastest pedigree analysis packages around (Abecasis
et al, 2002).
Loki
Loki is a linkage analysis package, primarily for
large and complex pedigrees, which uses Markov chain Monte
Carlo (MCMC) techniques to avoid many of the computational
problems that prevent exact computational methods being used
for large pedigrees.
FASTLINK/FastSLINK
FASTLINK is a modified and improved version of the original
LINKAGE suite for genetic linkage analysis. The additional LINKAGE utilities
are also installed. FastSLINK is a merger of code from FASTLINK v 2.x
to the SLINK package, which simulates and analyzes replicates.
Computational
Chemistry/Molecular Modeling
AMBER
AMBER (Assisted Model Building with Energy Refinement) is a package
of molecular simulation programs. Version 9 is currently installed on
Biowulf. Major programs in the AMBER package include sander, gibbs,
nmode, LEap.
APBS
APBS (Adaptive Poisson-Boltzmann Solver) is a software package for the
numerical solution of the Poisson-Boltzmann equation (PBE), one of the
most popular continuum models for describing electrostatic interactions
between molecular solutes in salty, aqueous media.
Autodock
is a suite of automated docking tools. It is designed to predict how small molecules, such as substrates or drug candidates, bind to a receptor of known 3D structure.
CHARMM
on Biowulf
CHARMM (Chemistry at HARvard Molecular Mechanics) is a program
which supports a wide range of theoretical modeling calculations of
the structure and dynamics of biological molecules. In addition to energy
minimization and molecular dynamics simulations, Monte Carlo sampling,
use of genetic algorithms, and several interfaces to quantum codes (AM1,
GAMESS) are available or under development. Recent CHARMM versions have
been made available for use on Biowulf, as a joint effort between NHLBI/LBC
Computational Biophysics Section and CBER/OVRR Biophysics Lab and with
the support of Biowulf Staff. Multiple executables are available for
each version, in order to support larger molecular systems, and the
different types of parallel communications available on Biowulf, i.e.
ethernet and Myrinet 2000. The support files are also available for
the above versions, e.g. version .doc files, and the standard
topology and parameter files.
CHARMM is a fairly
sophisticated and complicated command line based program; detailed CHARMM
Documentation is available online.
GAMESS
GAMESS is a program for ab initio quantum chemistry. Briefly, GAMESS
can compute wavefunctions ranging from RHF, ROHF, UHF, GVB, and MCSCF,
with CI and MP2 energy corrections available for some of these. Analytic
gradients are available for these SCF functions, for automatic geometry
optimization, transition state searches, or reaction path following.
Computation of the energy hessian permits prediction of vibrational
frequencies. A variety of molecular properties, ranging from simple
dipole moments to frequency dependent hyperpolarizabilities may be computed.
Many basis sets are stored internally, and together with effective core
potentials, all elements up to Radon may be included in molecules. Several
graphics programs are available for viewing of the final results. Many
of the computational functions can be performed using direct techniques,
or in parallel on appropriate hardware.
GAUSSIAN
03
Gaussian 03 is a series of electronic structure programs performing
computations starting from the basic laws of quantum mechanics. Gaussian
can predict energies, molecular structures, vibrational frequencies
for systems in the gas phase and in solution, and it can model them
in both their ground state and excited states.
GROMACS
is a versatile package to perform molecular dynamics, i.e. simulate the
Newtonian equations of motion for systems with hundreds to millions of
particles. It is primarily designed for biochemical molecules like proteins
and lipids that have a lot of complicated bonded interactions, but since
GROMACS is extremely fast at calculating the nonbonded interactions (that
usually dominate simulations) many groups are also using it for research
on non-biological systems, e.g. polymers.
NAMD
NAMD is a parallel molecular dynamics program for UNIX platforms designed
for high-performance simulations in structural biology. It is developed
by the Theoretical Biophysics Group at the Beckman Center, University
of Illinois. NAMD is particularly well suited to Beowulf clusters, as
it was specifically designed to run efficiently on parallel machines.
VMD, the molecular visualization program integrated with
NAMD, is also available on the Helix Systems.
PROSPECT
PROSPECT is a threading-based protein structure prediction system. PROSPECT
will find structural homologs of a target sequence, even when the structural
homolog sequences have insignificant identity to the target sequence.
Q-Chem
is an ab initio electronic structure program capable of performing first
principles calculations on both the ground and excited states of molecules.
Schrödinger
A limited number of Schrödinger applications (such as MacroModel, Jaguar, and QikProp)
are available through the Molecular Modeling Interest Group.
Proteomics/Mass
Spectrometry
OMSSA
An efficient search engine for identifying MS/MS peptide spectra by searching
libraries of known protein sequences. OMSSA scores significant hits with
a probability score developed using classical hypothesis testing, the
same statistical method used in BLAST.
X!Tandem
Matches tandem mass spectra with peptide sequences for protein identification.
Mathematical
Analysis / Statistics
GAUSS
The GAUSS Mathematical and Statistical System is a fast matrix
programming language designed for computationally intensive tasks, which
has a wide variety of statistical, mathematical and matrix handling
routines.
Matlab
Matlab integrates mathematical computing, visualization, and
a powerful language to provide a flexible environment for technical
computing.
Mathematica
Mathematica is a fully integrated environment for technical and scientific computing. Mathematica combines numerical and symbolic computation, visualization, and programming in a single, flexible interactive system.
Octave
GNU Octave is an open-source language for numerical calculations that has a command-line
interface and can interpret many (but not all) Matlab scripts. It is not license-limited
and so can be used for many simultaneous independent runs.
R
R (the R Project) is a language and environment for statistical
computing and graphics. R is similar to S, and provides a wide variety
of statistical and graphical techniques (linear and nonlinear modelling,
statistical tests, time series analysis, classification, clustering,
...).
SAS
The SAS System is an integrated, hardware-independent system
of applications software for data access, management, statistical analysis
and report writing. The Base SAS windowing environment provides a full-screen
facility for interacting with all parts of a SAS program.
Scilab
Scilab is an open-source alternative to Matlab which includes hundreds of
mathematical functions and the ability to interactively add C/Fortran programs. It includes a
Matlab->Scilab converter.
Image Analysis
FSL
FSL is a comprehensive library of image analysis and statistical tools for FMRI, MRI and DTI brain imaging data.
AFNI
AFNI (Analysis of Functional NeuroImages) is a set of C programs for processing, analyzing, and displaying functional MRI (FMRI) data - a technique for mapping human brain activity.
EMAN
EMAN is a suite of scientific image processing tools aimed primarily at the transmission electron microscopy community, though it is beginning to be used in other fields as well.
Structural Biology
CNS
Crystallography and NMR System (CNS) is a flexible multi-level
package for macromolecular structure determination.
XPLOR-NIH
Xplor-NIH is a structure determination program which builds
on the X-PLOR program, including additional tools for NMR analysis.
The advantage of running Xplor-NIH on Biowulf would be to spawn a large
number of independent refinement jobs which would run on multiple Biowulf
nodes.
PovRay
POVRAY (Persistence of Vision RAYtracer) is a high-quality tool
for creating three-dimensional graphics. Raytraced images are publication-quality
and 'photo-realistic', but are computationally expensive so that large
images can take many hours to create. PovRay images can also require
more memory than many desktop machines can handle. To address these
concerns, a parallelized version of PovRay has been installed on the
Biowulf system.
Qs
Qs (Queen of Spades) is a "brute force" style molecular replacement
program which uses a method based on a reverse Monte Carlo minimisation
of the conventional crystallographic R-factor in the 6n-dimensional
space defined by the rotational and translational parameters of the
n molecules. Because all parameters of all molecules are determined
simultaneously, this algorithm should improve the signal-to-noise ratio
in difficult cases involving high crystallographic/non-crystallographic
symmetry in tightly packed crystal forms.
AMoRe
AMoRe is an automated utility for performing molecular replacement
using fast rotation and translation functions in a step-wise fashion.
HADDOCK
HADDOCK (High Ambiguity Driven protein-protein DOCKing) is an
approach for predicting protein-protein complex structures that makes
use of biochemical and/or biophysical interaction data such as chemical
shift perturbation data resulting from NMR titration experiments or
mutagenesis data.
Rosetta++
The Rosetta++ software suite can perform de novo protein structure
predictions, identify low free energy sequences for target
protein backbones, predict the structure of a protein-protein
complex from the individual structures of the monomer components,
incorporate NMR data into the basic Rosetta protocol to accelerate
the process of NMR structure prediction, and more...
CSRosetta
Chemical-Shift-ROSETTA is a robust protocol to use NMR
chemical shifts for de novo protein structure generation by
SPARTA-based selection of protein fragments from the PDB,
in conjunction with a regular ROSETTA Monte Carlo assembly and
relaxation method.
ZDOCK
ZDOCK predicts protein-docking models, and uses a fast Fourier
transform to search all possible binding modes for proteins,
evaluating based on shape complementarity, desolvation energy, and
electrostatics.
Nest
Command-line homology model builder (written by Jason (Zhexin) Xiang) on par with MODELER. To use, type nest at the prompt. Nest can
be used in conjuction with PROSPECT using prospect2pdb.pl
General Purpose
Swarm
Swarm is a program designed to simplify submitting a group of commands
to the cluster. Some programs do not scale well and thus are not suited
to true parallelizing. Other programs may be such that each individual
job is very short, but many such jobs need to be run. Such programs
are well suited to running 'swarms of single-threaded jobs'. The Swarm
program simplifies this process. See the documentation
for details. Download swarm.
Utilities
on Biowulf
|