Biowulf at the NIH
RSS Feed
BamUtil on Biowulf

bamUtil is a repository that contains several programs that perform operations on SAM/BAM files. All of these programs are built into a single executable, bam.

BamUtil was developed at the Abecasis lab at U. Michigan

The available version(s) of bamutil can be seen using 'module avail bamutil', and the bamutil executable can be added to your path by using the 'module load bamutil' as in the example below.

[user@biowulf modulefiles]$ module avail bamutil

------------------ /usr/local/Modules/3.2.9/modulefiles ------------------

[user@biowulf modulefiles]$ module load bamutil

Submitting a Single Batch Job

Create a script file along the following lines:

# This file is runbamutil
#PBS -N RunBam
#PBS -m be
#PBS -k oe

module load bamutil

cd /data/user/somewhereWithInputfile

bam convert myfile.sam myfile.bam
bam splitChromosome --in myfile.bam --out myfile.bam
bam diff --mapQual --in1 file1.bam --in2 file2.bam 

Submit the script using the 'qsub' command on Biowulf.

qsub -l nodes=1 /data/username/runbamutil

Submitting a Swarm of Jobs

Using the 'swarm' utility, one can submit many jobs to the cluster to run concurrently.

Set up a swarm command file (eg /data/username/cmdfile). Here is a sample file:

 cd /data/user/mydir; bam convert file1.bam file1.sam
 cd /data/user/mydir; bam convert file2.bam file2.sam
 cd /data/user/mydir; bam convert file3.bam file3.sam
 cd /data/user/mydir; bam convert file4.bam file4.sam

Submit this swarm with

swarm -f cmdfile --module bamutil

This will submit a swarm of jobs so that each of the commands above runs on a single core using up to 1 GB of memory. If the commands will require more than 1 GB of memory each, you need to specify that to swarm using the -g # flag, where # is the number of GB of memory required. For example:

swarm -g 3 -f cmdfile --module bamutil
will submit a swarm so that each command can use up to 3 GB of memory.

For more information regarding running swarm, see swarm.html

Running an Interactive Job

User may need to run jobs interactively sometimes. Such jobs should not be run on the Biowulf login node. Instead allocate an interactive node as described below, and run the interactive job there.

biowulf% qsub -I -l nodes=1
qsub: waiting for job 2236960.biobos to start
qsub: job 2236960.biobos ready

[user@p4]$ cd /data/user/myruns
[user@p4]$ module load bamutil [user@p4]$ cd /data/user/somewhereWithInputfile
[user@p4]$ bam splitBam -v -i myfile.bam -o outfile -L logfile [user@p4] exit
qsub: job 2236960.biobos completed
[user@biowulf ~]$

A specific type of node (e.g. with more memory) can be specified on the qsub command line. For example, if you need a node with 8gb of memory to run job interactively, do this:

biowulf% qsub -I -l nodes=1:g8

Type 'freen' to see available types of nodes.