Biowulf at the NIH
RSS Feed
BamTools on Biowulf

BamTools provides both a programmer's API and an end-user's toolkit for handling BAM files. It was developed by Derek Barnett in the Marth lab at Boston College.

 

The environment variable(s) need to be set properly first. The easiest way to do this is by using the modules commands, 'module load bamtools', as in the example below.

[user@biowulf]$ module avail bamtools

-------------------- /usr/local/Modules/3.2.9/modulefiles ---------------------
bamtools/2.1.1          bamtools/2.2.0(default)

[user@biowulf]$ module load bamtools

[user@biowulf]$ module list
Currently Loaded Modulefiles:
  1) bamtools/2.2.0

Submitting a single batch job

1. Create a script file along the following lines:

#!/bin/bash
# This file is YourOwnFileName
#
#PBS -N yourownfilename
#PBS -m be
#PBS -k oe

module load bamtools
cd /data/user/somewhereWithInputFile
bamtools convert file.bam file.fastq
bamtools sort file.bam

2. Submit the script using the 'qsub' command on Biowulf.

[user@biowulf]$ qsub -l nodes=1 /data/username/theScriptFileAbove

 

Submitting a swarm of jobs

Using the 'swarm' utility, one can submit many jobs to the cluster to run concurrently.

Set up a swarm command file (eg /data/username/cmdfile). Here is a sample file:

cd /data/user/somedir; bamtools convert file1.bam file1.fastq
cd /data/user/somedir; bamtools convert file2.bam file2.fastq
cd /data/user/somedir; bamtools convert file3.bam file3.fastq
[...]

Submit this job with:

swarm -f cmdfile --module bamtools

By default, each line of the swarm command file will be executed on 1 processor and can use up to 1 GB of memory. If your commands will need more than 1 GB, then you should specify the required memory to swarm by using the -g # flag, where # is the number of GB of memory required. For example, if each command requires 4 GB of memory, submit with:

[user@biowulf]$ swarm -g 10 -f cmdfile --module bamtools

For more information regarding running swarm, see swarm.html

 

Running an interactive job

User may need to run jobs interactively sometimes. Such jobs should not be run on the Biowulf login node. Instead allocate an interactive node as described below, and run the interactive job there.

[user@biowulf] $ qsub -I -l nodes=1
qsub: waiting for job 2236960.biobos to start
qsub: job 2236960.biobos ready

[user@p4]$ cd /data/user/myruns
[user@p4]$ module load bamtools
[user@p4]$ cd /data/userID/bamtools/run1
[user@p4]$ bamtools convert file1.bam file1.fastq
[user@p4]$ bamtools merge -in file1.bam file2.bam file3.bam -out merged.bam
[user@p4] exit
qsub: job 2236960.biobos completed
[user@biowulf]$

If you need a specific type of node, e.g. with 8 GB of memory, you can specify that on the qsub command line. For example,

qsub -I -l nodes=1:g8

Documentation

After loading the bamtools module, typing 'bamtools' with no parameters will print a simple help page.

[user@biowulf]$ module load bamtools
[user@biowulf]$ bamtools

usage: bamtools [--help] COMMAND [ARGS]

Available bamtools commands:
        convert         Converts between BAM and a number of other formats
        count           Prints number of alignments in BAM file(s)
        coverage        Prints coverage statistics from the input BAM file
        filter          Filters BAM file(s) by user-specified criteria
        header          Prints BAM header information
        index           Generates index for BAM file
        merge           Merge multiple BAM files into single file
        random          Select random alignments from existing BAM file(s), intended more as a testing tool.
        resolve         Resolves paired-end reads (marking the IsProperPair flag as needed)
        revert          Removes duplicate marks and restores original base qualities
        sort            Sorts the BAM file according to some criteria
        split           Splits a BAM file on user-specified property, creating a new BAM output file for each value found
        stats           Prints some basic statistics from input BAM file(s)

See 'bamtools help COMMAND' for more information on a specific command.

PDF documentation

Bamtools Wiki