BLAT is a DNA/Protein Sequence Analysis program written by Jim Kent at UCSC. It is designed to quickly find sequences of 95% and greater similarity of length 40 bases or more. It may miss more divergent or shorter sequence alignments. It will find perfect sequence matches of 33 bases, and sometimes find them down to 22 bases. BLAT on proteins finds sequences of 80% and greater similarity of length 20 amino acids or more. In practice DNA BLAT works well on primates, and protein blat on land vertebrates. For more information see the BLAT web page or Jim Kent's web page.
The 'easyblat' script simplifies running large BLAT jobs. You need to put all your query sequences into a directory, and then type 'easyblat' at the Biowulf prompt. You will be prompted for all required parameters. The script will then decide what kind of node you need (based on the database you choose) and submit your job to as many nodes as are available (max 24).Sample session: (user input is in bold):
biowulf% easyblat
EasyBLAT: BLAT (not Blast!) for large numbers of sequences
Enter the directory which contains your input sequences: /data/user/mydir/seqs
Enter the directory where you want your BLAT output to go: /data/user/mydir/out
** WARNING: There are already files in /data/user/mydir/out which will be overwritten by this job.
** Continue? (y/n): y
The following databases are available:
H - Human Genome Feb 2009 assembly
M - Mouse Genome Jul 2007 assembly
O - Other databases
Enter H, M or O for a detailed list: H
Human Genome (Build 37, hg19, Feb 2009) assembly:
chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11
chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20,
chr21, chr22, chrX, chrY, chr_all
Enter human section to run against: chr_all
http://biowulf.nih.gov/blat.html has a full list of available parameters.
Any additional BLAT parameters (e.g. -maxGap=3): -minScore=35 -trimT
Creating parameter file /data/user/blat_tmp.12971/blat_par.12971
Submitting: qsub -v np=128,read=/data/user/blat_tmp.12971/blat_par.12971 -l nodes=16:g24 -N EasyBlat /usr/local/blat/nih/easyrunblat
Submitting to 16 nodes. Job number is 2384446.biobos
Monitor your job at http://biowulf.nih.gov/cgi-bin/queuemon?2384446.biobos
As you see above, easyblat does some simple error checking, such as checking whether your query sequences exist. It will set up all temporary files and directories, and submit the job for you.
You can run against your own database (any fasta format file) by selecting 'other databases', and then entering the full pathname of the database you want to search. For example:
The following databases are available:
H - Human Genome (Apr 2006) assembly
M - Mouse Genome (Jul 2007) assembly
O - Other databases
Enter H, M or O for a detailed list: O
Other databases, updated weekly:
pdb - from the PDB 3-dimensional structures
drosoph - Drosophila sequences
ecoli - E. Coli sequences
mito - mitochondrial sequences
yeast - Yeast sequences
If using your own database, enter the full pathname.
Enter db to run against: /data/user/my_db.fas
Easyblat uses swarm. If you prefer to run swarm directly, set up a swarm command file along the following lines:
# this file is called blatcmd # commands are 'blat database_file query_sequence outputfile' # blat /fdb/genome/mm9/chr_all.fa /data/user/myseqs/seq1.fas /data/user/blatout/seq1.out blat /fdb/genome/mm9/chr_all.fa /data/user/myseqs/seq2.fas /data/user/blatout/seq2.out blat /fdb/genome/mm9/chr_all.fa /data/user/myseqs/seq3.fas /data/user/blatout/seq3.out blat /fdb/genome/mm9/chr_all.fa /data/user/myseqs/seq4.fas /data/user/blatout/seq4.out [...]
The memory required for each blat command will be approximately the size of the database file. In this case, the file chr_all.fa is about 2.6 GB
[user@biowulf ]$ ls -lh /fdb/genome/mm9/chr_all.fa -rw-rw-r-- 1 helixapp staff 2.6G Mar 25 2008 /fdb/genome/mm9/chr_all.fa
swarm -g 3 -f blatcmd
- Any fasta-format file can serve as a BLAT database. A large collection of fasta-format databases are already available and updated on the Helix Systems. A list is at
http://helix.nih.gov/Applications/helixdb.php?sort=format#Fasta. - The time taken by each Blat command varies according to the sequence size and # of hits. Thus, you may see some processes continue to run after others have completed. The load on the nodes may drop. This is ok.
BLAT - The
Blast-Like Alignment Tool. W. James Kent, Genome Research 12(4):
656-664, April 2002
BLAT Suite Program
Specifications and User Guide. at the UCSC Genome website. All BLAT options
are listed on this page.


