Biowulf at the NIH
RSS Feed
HMMER on Biowulf

Profile hidden Markov models for biological sequence analysis

Profile hidden Markov models (profile HMMs) can be used to do sensitive database searching using statistical descriptions of a sequence family's consensus. HMMER uses profile HMMs, and can be useful in situations like:

HMMER (pronounced 'hammer', as in a more precise mining tool than BLAST) was developed by Sean Eddy at Washington University in St. Louis. The HMMER website is

HMMER User Guide (PDF)

HMMER is a cpu-intensive program and is parallelized using threads, so that each instance of hmmsearch or the other search programs can use all the cpus available on a node. HMMER on Biowulf is intended for those who need to run HMMER searches on large numbers of query sequences.

There are several versions of HMMER available on Helix/Biowulf. The easiest way to see available versions and use a particular version is via environment modules. e.g.
[user@biowulf]$ module avail hmmer

----------------- /usr/local/Modules/3.2.9/modulefiles --------------------
hmmer/3.0rc1 hmmer/3.1b1

[user@biowulf]$ module load hmmer      (load the default latest version)

[user@biowulf]$ module list
Currently Loaded Modulefiles:
  1) hmmer/3.1b1

[user@biowulf]$ module unload hmmer

[user@biowulf]$ module load hmmer/3.0rc1   (load a particular version)

[user@biowulf]$ module list
Currently Loaded Modulefiles:
  1) hmmer/3.0rc1

Searching a sequence database with a profile HMM
One use of HMMER is to search a sequence database with a single profile HMM created out of a set of aligned sequences. You would first align your set of sequences with hmmalign or a program such as ClustalW, then build a profile HMM from the alignment with hmmbuild, then run a search against a database with this profile HMM with hmmsearch. (read the HMMER User Guide for details on the format of the aligned sequence file).

Create a batch script along the following lines:

#PBS -N hmmer
#PBS -j oe

# this file is hmmer.bat

# load the latest default version
module load hmmer

cd /data/user/mydir
hmmalign globins4.align globins45
hmmbuild globins4.hmm globins4.align
hmmsearch globins4.hmm /fdb/fastadb/nr.aa.fas > globins.out

Submit this job with:

qsub -l nodes=1 hmmer.bat

Searching a profile HMM database with a query sequence
The hmmscan program is for annotating all the different known/detectable domains in a given sequence. If you have only a single sequence, you could run hmmscan interactively on Helix. If you have several query sequences, it is advantageous to run them simultaneously on Biowulf.

hmmscan runs against an HMM database such as Pfam. The Pfam database is maintained and updated on our systems in /fdb/fastadb/pfam. You can also create your own HMM database -- see the HMMER User Guide for details.

Create a swarm command file with one line for each of the query sequences. Sample swarm command file:

---------------- file swarm.cmd ----------------------------------------------------
hmmscan  /fdb/fastadb/pfam/Pfam_fs  /data/user/seqs/myseq1 > /data/user/out/seq1.out
hmmscan  /fdb/fastadb/pfam/Pfam_fs  /data/user/seqs/myseq2 > /data/user/out/seq2.out
hmmscan  /fdb/fastadb/pfam/Pfam_fs  /data/user/seqs/myseq3 > /data/user/out/seq3.out
hmmscan  /fdb/fastadb/pfam/Pfam_fs  /data/user/seqs/myseq4 > /data/user/out/seq4.out
hmmscan  /fdb/fastadb/pfam/Pfam_fs  /data/user/seqs/myseq5 > /data/user/out/seq5.out
The HMMER search programs will 'autothread', i.e by default use all available cpus on a node. Therefore this swarm job should be submitted so as to run only a single command on each node. Submit with:
swarm -t auto -f swarm.cmd --module hmmer
The entire HMMER suite of programs is available in /usr/local/apps/hmmer. Note that only hmmcalibrate, hmmsearch and hmmpfam are parallelized.

A large collection of protein sequence databases is in /fdb/fastadb/.
Fasta-format databases and update status.

User Guide for v3.1b1 (PDF)