Profile hidden Markov models for biological sequence analysis
Profile hidden Markov models (profile HMMs) can be used to do sensitive database searching using statistical descriptions of a sequence family's consensus. HMMER uses profile HMMs, and can be useful in situations like:- if you are working with an evolutionarily diverse protein family, a BLAST search with any individual sequence may not find the rest of the sequences in the family.
- the top hits in a BLAST search are hypothetical sequences from genome projects.
- your protein consists of several domains which are of different types.
HMMER (pronounced 'hammer', as in a more precise mining tool than BLAST) was developed by Sean Eddy at Washington University in St. Louis. The HMMER website is hmmer.janelia.org.
HMMER User Guide
(PDF)
HMMER is a very cpu-intensive program and is parallelized using threads, so that each instance of hmmsearch or the other search programs can use all the cpus available on a node. HMMER on Biowulf is intended for those who need to run HMMER searches on large numbers of query sequences.
Create a batch script along the following lines:
#!/bin/bash #PBS -N hmmer #PBS -j oe # this file is hmmer.bat cd /data/user/mydir hmmalign globins4.align globins45 hmmbuild globins4.hmm globins4.align hmmsearch globins4.hmm /fdb/fastadb/nr.aa.fas > globins.out
Submit this job with:
qsub -l nodes=1 hmmer.bat
hmmscan runs against an HMM database such as Pfam. The Pfam database is maintained and updated on our systems in /fdb/fastadb/pfam. You can also create your own HMM database -- see the HMMER User Guide for details.
Create a swarm command file with one line for each of the query sequences. Sample swarm command file:
---------------- file swarm.cmd ---------------------------------------------------- hmmscan /fdb/fastadb/pfam/Pfam_fs /data/user/seqs/myseq1 > /data/user/out/seq1.out hmmscan /fdb/fastadb/pfam/Pfam_fs /data/user/seqs/myseq2 > /data/user/out/seq2.out hmmscan /fdb/fastadb/pfam/Pfam_fs /data/user/seqs/myseq3 > /data/user/out/seq3.out hmmscan /fdb/fastadb/pfam/Pfam_fs /data/user/seqs/myseq4 > /data/user/out/seq4.out hmmscan /fdb/fastadb/pfam/Pfam_fs /data/user/seqs/myseq5 > /data/user/out/seq5.out [....] ------------------------------------------------------------------------------------
swarm -t auto -f swarm.cmd
A large collection of protein sequence databases is in
/fdb/fastadb/.
Fasta-format
databases and update status.


