biowulf_logo

Status
About
Hardware
Applications
Batch queues
Disk storage

MPI
Performance
New Users
User Guide
Documentation
Research
Photos


    hammer_sm

    HMMER on Biowulf

    Profile hidden Markov models for biological sequence analysis

    Profile hidden Markov models (profile HMMs) can be used to do sensitive database searching using statistical descriptions of a sequence family's consensus. HMMER uses profile HMMs, and can be useful in situations like:
    • if you are working with an evolutionarily diverse protein family, a BLAST search with any individual sequence may not find the rest of the sequences in the family.
    • the top hits in a BLAST search are hypothetical sequences from genome projects.
    • your protein consists of several domains which are of different types.

    HMMER (pronounced 'hammer', as in a more precise mining tool than BLAST) was developed by Sean Eddy at Washington University in St. Louis. The HMMER website is hmmer.janelia.org.

    HMMER User Guide (PDF)

    HMMER is a very cpu-intensive program and is parallelized using threads, so that each instance of hmmpfam or hmmsearch can use all the cpus available on a node. HMMER on Biowulf is intended for those who need to run HMMER searches on large numbers of query sequences.


    Searching query sequences against a profile HMM database

    One use of HMMER is to look for known domains in a query sequence, by searching a single sequence against a library of HMMs. One such library is the PFAM database. PFAM is available and updated on our systems in the directory /fdb/fastadb/pfam. It is also possible to create your own database; see the user guide for details).

    Create a swarm command file with one line for each of the query sequences. Sample swarm command file:

    ---------------- file swarm.cmd ----------------------------------------------------
    hmmpfam  /fdb/fastadb/pfam/Pfam_fs  /data/user/seqs/myseq1 > /data/user/out/seq1.out
    hmmpfam  /fdb/fastadb/pfam/Pfam_fs  /data/user/seqs/myseq2 > /data/user/out/seq2.out
    hmmpfam  /fdb/fastadb/pfam/Pfam_fs  /data/user/seqs/myseq3 > /data/user/out/seq3.out
    hmmpfam  /fdb/fastadb/pfam/Pfam_fs  /data/user/seqs/myseq4 > /data/user/out/seq4.out
    hmmpfam  /fdb/fastadb/pfam/Pfam_fs  /data/user/seqs/myseq5 > /data/user/out/seq5.out
    [....]
    ------------------------------------------------------------------------------------
    
    The HMMER programs hmmcalibrate, hmmsearch, and hmmpfam are set up to use all available cpus on a node. Therefore this swarm job should be submitted so as to run only a single command on each node. Submit with:
    swarm -f swarm.cmd -n 1 
    


    Searching a sequence database for homologues of a protein family

    Another common use of HMMER is to search a sequence database for homologues of a protein family of interest. If you start with a file containing several sequences belonging to the family, you can use this to find remote homologues from a protein database. The following sample batch script will run hmmbuild, hmmcalibrate, and hmmsearch in sequence.
    ----------- file hmm_homolog  -----------------------------------------
    #!/bin/csh
    #PBS -N Hmmer
    #PBS -m be
    #PBS -k oe
    
    cd /data/user/mydir
    hmmbuild -g globins.hmm globins.msf 
    hmmcalibrate  globins.hmm 
    hmmsearch globins.hmm /fdb/fastadb/ecoli.aa.fas
    ------------------------------------------------------------------------
    
    This script starts with a multiple sequence alignment of a protein domain or protein family in the file globins.msf. This file can be created by aligning sequences with ClustalW. The hmmbuild command builds a profile HMM from the alignment, the hmmcalibrate command increases the sensitivity of the search, and the hmmsearch command uses the globin model to search for globin domains in the Ecoli database. See the HMMER documentation for more information.

    Submit this file with:

    qsub -l nodes=1 hmm_homolog
    


    More Info

    The entire HMMER suite of programs is available in /usr/local/hmmer. Note that only hmmcalibrate, hmmsearch and hmmpfam are parallelized.

    A large collection of protein sequence databases is in /fdb/fastadb/.
    Fasta-format databases and update status.


This document is available as http://biowulf.nih.gov/apps/hmmer/index.html
Biowulf home page | Helix Systems | NIH