BLAT is a DNA/Protein Sequence Analysis program written by Jim Kent at UCSC. It is designed to quickly find sequences of 95% and greater similarity of length 40 bases or more. It may miss more divergent or shorter sequence alignments. It will find perfect sequence matches of 33 bases, and sometimes find them down to 22 bases. BLAT on proteins finds sequences of 80% and greater similarity of length 20 amino acids or more. In practice DNA BLAT works well on primates, and protein blat on land vertebrates. For more information see the BLAT web page or Jim Kent's web page.
Sample session: (user input is in bold):
biowulf% easyblat EasyBLAT: BLAT (not Blast!) for large numbers of sequences Enter the directory which contains your input sequences: /data/user/mydir/seqs Enter the directory where you want your BLAT output to go: /data/user/mydir/out ** WARNING: There are already files in /data/user/mydir/out which will be overwritten by this job. ** Continue? (y/n): y The following databases are available: H - Human Genome Feb 2009 assembly M - Mouse Genome Jul 2007 assembly O - Other databases Enter H, M or O for a detailed list: H Human Genome (Build 37, hg19, Feb 2009) assembly: chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11 chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY, chr_all Enter human section to run against: chr_all http://biowulf.nih.gov/blat.html has a full list of available parameters. Any additional BLAT parameters (e.g. -maxGap=3): -minScore=35 -trimT Creating parameter file /data/user/blat_tmp.12971/blat_par.12971 Submitting: qsub -v np=128,read=/data/user/blat_tmp.12971/blat_par.12971 -l nodes=16:g24 -N EasyBlat /usr/local/blat/nih/easyrunblat Submitting to 16 nodes. Job number is 2384446.biobos Monitor your job at http://biowulf.nih.gov/cgi-bin/queuemon?2384446.biobos
As you see above, easyblat does some simple error checking, such as checking whether your query sequences exist. It will set up all temporary files and directories, and submit the job for you.
You can run against your own database (any fasta format file) by selecting 'other databases', and then entering the full pathname of the database you want to search. For example:
The following databases are available: H - Human Genome (Apr 2006) assembly M - Mouse Genome (Jul 2007) assembly O - Other databases Enter H, M or O for a detailed list: O Other databases, updated weekly: pdb - from the PDB 3-dimensional structures drosoph - Drosophila sequences ecoli - E. Coli sequences mito - mitochondrial sequences yeast - Yeast sequences If using your own database, enter the full pathname. Enter db to run against: /data/user/my_db.fas
Easyblat uses swarm. If you prefer to run swarm directly, set up a swarm command file along the following lines:
# this file is called blatcmd # commands are 'blat database_file query_sequence outputfile' # blat /fdb/genome/mm9/chr_all.fa /data/user/myseqs/seq1.fas /data/user/blatout/seq1.out blat /fdb/genome/mm9/chr_all.fa /data/user/myseqs/seq2.fas /data/user/blatout/seq2.out blat /fdb/genome/mm9/chr_all.fa /data/user/myseqs/seq3.fas /data/user/blatout/seq3.out blat /fdb/genome/mm9/chr_all.fa /data/user/myseqs/seq4.fas /data/user/blatout/seq4.out [...]
The memory required for each blat command will be approximately the size of the database file. In this case, the file chr_all.fa is about 2.6 GB
[user@biowulf ]$ ls -lh /fdb/genome/mm9/chr_all.fa -rw-rw-r-- 1 helixapp staff 2.6G Mar 25 2008 /fdb/genome/mm9/chr_all.fa
swarm -g 3 -f blatcmd
- Any fasta-format file can serve as a BLAT database. A large collection of fasta-format databases are already available and updated on the Helix Systems. A list is at
- The time taken by each Blat command varies according to the sequence size and # of hits. Thus, you may see some processes continue to run after others have completed. The load on the nodes may drop. This is ok.
BLAT - The
Blast-Like Alignment Tool. W. James Kent, Genome Research 12(4):
656-664, April 2002
BLAT Suite Program Specifications and User Guide. at the UCSC Genome website. All BLAT options are listed on this page.