Blat-like Fast Accurate Search Tool (BFAST) facilitates the fast and accurate mapping of short reads to reference sequences. Some advantages of BFAST include:
- Speed: enables billions of short reads to be mapped quickly.
- Accuracy: A priori probabilities for mapping reads with defined set of
variants.
- An easy way to measurably tune accuracy at the expense of speed.
Specifically, BFAST was designed to facilitate whole-genome resequencing, where mapping billions of short reads with variants is of utmost importance. BFAST supports both Illumina and ABI SOLiD data, as well as any other Next-Generation Sequencing Technology (454, Helicos), with particular emphasis on sensitivity towards errors, SNPs and especially indels. Other algorithms take short-cuts by ignoring errors, certain types of variants (indels), and even require further alignment, all to be the "fastest" (but still not complete). BFAST is able to be tuned to find variants regardless of the error-rate, polymorphism rate, or other factors.
/usr/local/bfast/bin
The environmental variable(s) need to be set correctly first:
The bfast executables need to be added to your path. The easiest way to do this is by using the modules commands, as in the example below.
If you use this application very often, you can set the environmental variables in your /home/UserID/.bashrc or /home/userID/.cshrc file so that it w\ ill be done automatically when you login and you don't need to set the environmental variable(s) everytime.
For bash users:
For tcsh/csh users:
1. Create a script file. The file will contain the lines similar to the lines below. Modify the path of location before running. Remember, the $PATH environmental variables have to be set correctly first.
#!/bin/bash # This file is bfast # #PBS -N bfast #PBS -m be #PBS -k oe module load bfast cd /data/user/somewhereWithInputfile ill2fastq.pl -q s <N> bfast fasta2brg -f hg18.fa bfast index -f hg18.fa -m <mask> -w 14 -i <index number> bfast match -f hg18.fa -r reads.s <N>.fastq > bfast.matches.file.s <N>.bmf bfast localalign -f hg18.fa -m bfast.matches.file.s <N>.bmf > bfast.aligned.file.s <N>.baf bfast postprocess -f hg18.fa -i bfast.aligned.file.s <N>.baf > bfast.reported.file.s <N>.sam
2. Note, '-n' option for multi-threaded alignment can be used. For example, the following will thread to 4 processes (cores) on a node. And user needs to make sure the job is running on a node with at least 4 processors. Check the command 'freen' to decide which kind of nodes to run your job with -n flag:
bfast localalign -n 4 -f hg18.fa -m bfast.matches.file.s
<N>.bmf > bfast.aligned.file.s <N>.baf
3. Submit the script using the 'qsub' command on Biowulf, e.g. Note, user is recommend to run benchmarks to determine what kind of node is suitable for his/her jobs.
qsub -l nodes=1:g8 /data/username/theScriptFileAbove
Useful commands:
freen: see http://biowulf.nih.gov/user_guide.html#freen
qstat: search for 'qstat' on http://biowulf.nih.gov/user_guide.html for it's usage.
jobload: search for 'jobload' on http://biowulf.nih.gov/user_guide.html for it's usage.
Using the 'swarm' utility, one can submit many jobs to the cluster to run concurrently.
Set up a swarm command file (eg /data/username/bfastcmdfile). Here is a sample file:
module load bfast ;\
cd /data/user/run1/; ill2fastq.pl -q s <N> ;\
bfast fasta2brg -f hg18.fa bfast index -f hg18.fa -m <mask> -w 14
-i <index number> ;\
bfast match -f hg18.fa -r reads.s <N>.fastq > bfast.matches.file.s
<N>.bmf;\
bfast localalign -f hg18.fa -m bfast.matches.file.s <N>.bmf >
bfast.aligned.file.s <N>.baf ;\
bfast postprocess -f hg18.fa -i bfast.aligned.file.s <N>.baf >
bfast.reported.file.s <N>.sam
cd /data/user/run2/; ill2fastq.pl -q s <N> ;\
bfast fasta2brg -f hg18.fa bfast index -f hg18.fa -m <mask> -w 14 -i <index number> ;\
bfast match -f hg18.fa -r reads.s <N>.fastq > bfast.matches.file.s <N>.bmf;\
bfast localalign -f hg18.fa -m bfast.matches.file.s <N>.bmf > bfast.aligned.file.s <N>.baf ;\
bfast postprocess -f hg18.fa -i bfast.aligned.file.s <N>.baf > bfast.reported.file.s <N>.sam
..........
module load bfast; \
cd /data/user/run10/; ill2fastq.pl -q s <N>; \
bfast fasta2brg -f hg18.fa bfast index -f hg18.fa -m <mask> -w 14
-i <index number> ;\
bfast match -f hg18.fa -r reads.s <N>.fastq > bfast.matches.file.s
<N>.bmf;\
bfast localalign -f hg18.fa -m bfast.matches.file.s <N>.bmf >
bfast.aligned.file.s <N>.baf ;\
bfast postprocess -f hg18.fa -i bfast.aligned.file.s <N>.baf >
bfast.reported.file.s <N>.sam
Each line of the commands above will be executed on one processor by default. so DO NOT use '-n' flag with localalign.
There are one flag of swarm that's required '-f' and two other flags of swarm user most possibly needs to specify when submit a swarm job: '-t' and '-g'.
-f: the swarm command file name above (required)
-t: number of processors per node to use for each line of the commands in the swarm file above.(optional)
-g: GB of memory needed for each line of the commands in the swarm file above.(optional)
By default, each line of the commands above will be executed on '1' processor core of a node and uses 1GB of memory. If this is not what you want, you will need to specify '-t' and '-g' flags when you submit the job on biowulf.
Say if each line of the commands above also will need to use 10gb of memory instead of the default 1gb of memory, make sure swarm understands this by including '-g 10' flag:
biowulf> $ swarm -g 10 -f cmdfile
For more information regarding running swarm, see swarm.html
User may need to run jobs interactively sometimes. Such jobs should not be run on the Biowulf login node. Instead allocate an interactive node as described below, and run the interactive job there.
qsub: waiting for job 2236960.biobos to start
qsub: job 2236960.biobos ready
[user@p4]$ cd /data/user/myruns
[user@p4]$ module load bfast
[user@p4]$ cd /data/userID/bfast/run1
[user@p4]$ bfast fasta2brg -f hg18.fa bfast index -f hg18.fa -m <mask>
-w 14 -i <index number>
[user@p4]$ ..........
[user@p4] exit
qsub: job 2236960.biobos completed
[user@biowulf ~]$
User may add property of node in the qsub command to request specific interactive node. For example, if you need a node with 24gb of memory to run job interactively, do this:


