Biowulf at the NIH
RSS Feed
BFAST on Biowulf

Blat-like Fast Accurate Search Tool (BFAST) facilitates the fast and accurate mapping of short reads to reference sequences. Some advantages of BFAST include:

- Speed: enables billions of short reads to be mapped quickly.
- Accuracy: A priori probabilities for mapping reads with defined set of variants.
- An easy way to measurably tune accuracy at the expense of speed.

Specifically, BFAST was designed to facilitate whole-genome resequencing, where mapping billions of short reads with variants is of utmost importance. BFAST supports both Illumina and ABI SOLiD data, as well as any other Next-Generation Sequencing Technology (454, Helicos), with particular emphasis on sensitivity towards errors, SNPs and especially indels. Other algorithms take short-cuts by ignoring errors, certain types of variants (indels), and even require further alignment, all to be the "fastest" (but still not complete). BFAST is able to be tuned to find variants regardless of the error-rate, polymorphism rate, or other factors.

Environment setup

The bfast executables need to be added to your path. The easiest way to do this is by using the modules commands, as in the example below.

[user@biowulf]$ module avail bfast

-------------- /usr/local/Modules/3.2.9/modulefiles ------------------
bfast/0.7.0a     bfast+bwa/0.7.0a

[user@biowulf]$ module load bfast

[user@biowulf]$ module list
Currently Loaded Modulefiles:
  1) bfast/0.7.0a

Submitting a single BFAST batch job

1. Create a script file. The file will contain the lines similar to the lines below. Modify the path of location before running. Remember, the $PATH environmental variables have to be set correctly first.

#!/bin/bash
# This file is bfast
#
#PBS -N bfast
#PBS -m be
#PBS -k oe

module load bfast
cd /data/user/somewhereWithInputfile
ill2fastq.pl -q s <N>
bfast fasta2brg -f hg18.fa
bfast index -f hg18.fa -m <mask> -w 14 -i <index number>
bfast match -f hg18.fa -r reads.s <N>.fastq > bfast.matches.file.s <N>.bmf
bfast localalign -f hg18.fa -m bfast.matches.file.s <N>.bmf > bfast.aligned.file.s <N>.baf
bfast postprocess -f hg18.fa -i bfast.aligned.file.s <N>.baf > bfast.reported.file.s <N>.sam

2. Note, '-n' option for multi-threaded alignment can be used. For example, the following will thread to 4 processes (cores) on a node. And user needs to make sure the job is running on a node with at least 4 processors. Check the command 'freen' to decide which kind of nodes to run your job with -n flag:

bfast localalign -n 4 -f hg18.fa -m bfast.matches.file.s 
      <N>.bmf > bfast.aligned.file.s <N>.baf

3. Submit the script using the 'qsub' command on Biowulf, e.g. Note, user is recommend to run benchmarks to determine what kind of node is suitable for his/her jobs.

qsub -l nodes=1:g8 /data/username/theScriptFileAbove
Submitting a swarm of BFAST jobs

Using the 'swarm' utility, one can submit many jobs to the cluster to run concurrently.

Set up a swarm command file (eg /data/username/bfastcmdfile). Here is a sample file:

cd /data/user/run1/; ill2fastq.pl -q s <N> ;\
bfast fasta2brg -f hg18.fa bfast index -f hg18.fa -m <mask> -w 14 -i <index number> ;\
bfast match -f hg18.fa -r reads.s <N>.fastq > bfast.matches.file.s <N>.bmf;\
bfast localalign -f hg18.fa -m bfast.matches.file.s <N>.bmf > bfast.aligned.file.s <N>.baf ;\
bfast postprocess -f hg18.fa -i bfast.aligned.file.s <N>.baf > bfast.reported.file.s <N>.sam

cd /data/user/run2/; ill2fastq.pl -q s <N> ;\
bfast fasta2brg -f hg18.fa bfast index -f hg18.fa -m <mask> -w 14 -i <index number> ;\
bfast match -f hg18.fa -r reads.s <N>.fastq > bfast.matches.file.s <N>.bmf;\
bfast localalign -f hg18.fa -m bfast.matches.file.s <N>.bmf > bfast.aligned.file.s <N>.baf ;\
bfast postprocess -f hg18.fa -i bfast.aligned.file.s <N>.baf > bfast.reported.file.s <N>.sam

..........

cd /data/user/run10/; ill2fastq.pl -q s <N>; \
bfast fasta2brg -f hg18.fa bfast index -f hg18.fa -m <mask> -w 14 -i <index number> ;\
bfast match -f hg18.fa -r reads.s <N>.fastq > bfast.matches.file.s <N>.bmf;\
bfast localalign -f hg18.fa -m bfast.matches.file.s <N>.bmf > bfast.aligned.file.s <N>.baf ;\
bfast postprocess -f hg18.fa -i bfast.aligned.file.s <N>.baf > bfast.reported.file.s <N>.sam

Each line of the commands above will be executed on one processor by default. so DO NOT use '-n' flag with localalign.

Submit this with, for example
swarm -f swarmfile -g 4 --module bfast
where '-g 4' tells swarm that each command in the swarmfile above requires 4 GB of memory.

For more information regarding running swarm, see swarm.html

Running an interactive BFAST job

User may need to run jobs interactively sometimes. Such jobs should not be run on the Biowulf login node. Instead allocate an interactive node as described below, and run the interactive job there.

biowulf% qsub -I -l nodes=1
qsub: waiting for job 2236960.biobos to start
qsub: job 2236960.biobos ready

[user@p4]$ cd /data/user/myruns
[user@p4]$ module load bfast
[user@p4]$ cd /data/userID/bfast/run1
[user@p4]$ bfast fasta2brg -f hg18.fa bfast index -f hg18.fa -m <mask> -w 14 -i <index number>
[user@p4]$ ..........
[user@p4] exit
qsub: job 2236960.biobos completed
[user@biowulf ~]$

User may add property of node in the qsub command to request specific interactive node. For example, if you need a node with 24gb of memory to run job interactively, do this:

biowulf% qsub -I -l nodes=1:g24:c16

Documentation

bfast-book.pdf