Biowulf at the NIH
RSS Feed
BedTools on Biowulf

The BEDTools utilities allow one to address common genomics tasks such finding feature overlaps and computing coverage. In addition, one can develop sophisticated pipelines that answer complicated research questions by "streaming" several BEDTools together.

There are several versions of bedtools maintained on the system. The easiest way to check which versions are available and load a particular version is by using the modules utilities, as in the example below:
biowulf% module avail bedtools

------------------- /usr/local/Modules/3.2.9/modulefiles ---------------------
bedtools/2.17.0(default) bedtools/2.5.1           bedtools/2.7.1

biowulf% module load bedtools

biowulf% module list
Currently Loaded Modulefiles:
  1) bedtools/2.17.0

biowulf% module unload bedtools

biowulf% module load bedtools/2.7.1

biowulf% module list
Currently Loaded Modulefiles:
  1) bedtools/2.7.1

Submitting a single batch job

1. Create a script file along the lines of the one below:

#!/bin/bash
# This file is YourOwnFileName
#
#PBS -N yourownfilename
#PBS -m be
#PBS -k oe

module load bedtools

cd /data/user/somewhereWithInputFile
bamToBed -i input.bam >output.bed

2. Submit the script using the 'qsub' command on Biowulf.

[user@biowulf]$ qsub -l nodes=1 /data/username/theScriptFileAbove

This will submit the job to a node with at least 2 cores, and at least 1 GB of memory. If your bedtools job requires more memory, you can specify the required memory to the qsub command. e.g.

[user@biowulf]$ qsub -l nodes=1:g24 /data/username/theScriptFileAbove
will submit the job to a node with 24 GB ('g24') of memory. Use 'freen' to see available node types.

Submitting a swarm of jobs

Using the 'swarm' utility, one can submit many jobs to the cluster to run concurrently.

Set up a swarm command file (eg /data/username/cmdfile). Here is a sample file:

module load bedtools; cd /data/user/myfiles; bamToBed -i file25.bam > file1.bed
module load bedtools; cd /data/user/myfiles; bamToBed -i file25.bam > file2.bed
module load bedtools; cd /data/user/myfiles; bamToBed -i file25.bam > file3.bed
[...]
module load bedtools; cd /data/user/myfiles; bamToBed -i file25.bam > file25.bed      

Submit this job with

swarm -f cmdfile
This will run each line on a single core of a node using at most 1 GB of memory. If each bedtools command (i.e. each line in the swarm file above) requires more than 1 GB of memory, then use the -g swarm flag to specify the memory required. e.g.
swarm -g 5 -f cmdfile
will tell swarm that each command requires 5 GB of memory.

For more information regarding running swarm, see swarm.html

Running an interactive job

Users may need to run jobs interactively sometimes. Such jobs should not be run on the Biowulf login node. Instead allocate an interactive node as described below, and run the interactive job there.

[user@biowulf] $ qsub -I -l nodes=1
qsub: waiting for job 2236960.biobos to start
qsub: job 2236960.biobos ready

[user@p4]$ cd /data/user/myruns
[user@p4]$ module load bedtools
[user@p4]$ cd /data/userID/bedtools/run1
[user@p4]$ bamToBed -i input.bam >output.bed
[user@p4]$ ...........
[user@p4] exit
qsub: job 2236960.biobos completed
[user@biowulf]$

The qsub command above will allocate a node with at least 1 GB of memory. If you need more than that, you can specify the memory requirement on the qsub command line, e.g.

[user@biowulf]$ qsub -I -l nodes=1:g24:c16

will allocate a node with 24 GB of memory.

Documentation

http://code.google.com/p/bedtools/#Example_Usage