The BEDTools utilities allow one to address common genomics tasks such finding feature overlaps and computing coverage. In addition, one can develop sophisticated pipelines that answer complicated research questions by "streaming" several BEDTools together.
There are several versions of bedtools maintained on the system. The easiest way to check which versions are available and load a particular version is by using the modules utilities, as in the example below:biowulf% module avail bedtools ------------------- /usr/local/Modules/3.2.9/modulefiles --------------------- bedtools/2.17.0(default) bedtools/2.5.1 bedtools/2.7.1 biowulf% module load bedtools biowulf% module list Currently Loaded Modulefiles: 1) bedtools/2.17.0 biowulf% module unload bedtools biowulf% module load bedtools/2.7.1 biowulf% module list Currently Loaded Modulefiles: 1) bedtools/2.7.1
1. Create a script file along the lines of the one below:
#!/bin/bash # This file is YourOwnFileName # #PBS -N yourownfilename #PBS -m be #PBS -k oe module load bedtools cd /data/user/somewhereWithInputFile bamToBed -i input.bam >output.bed
2. Submit the script using the 'qsub' command on Biowulf.
This will submit the job to a node with at least 2 cores, and at least 1 GB of memory. If your bedtools job requires more memory, you can specify the required memory to the qsub command. e.g.
Using the 'swarm' utility, one can submit many jobs to the cluster to run concurrently.
Set up a swarm command file (eg /data/username/cmdfile). Here is a sample file:
module load bedtools; cd /data/user/myfiles; bamToBed -i file25.bam > file1.bed module load bedtools; cd /data/user/myfiles; bamToBed -i file25.bam > file2.bed module load bedtools; cd /data/user/myfiles; bamToBed -i file25.bam > file3.bed [...] module load bedtools; cd /data/user/myfiles; bamToBed -i file25.bam > file25.bed
Submit this job with
swarm -f cmdfile
swarm -g 5 -f cmdfile
For more information regarding running swarm, see swarm.html
Users may need to run jobs interactively sometimes. Such jobs should not be run on the Biowulf login node. Instead allocate an interactive node as described below, and run the interactive job there.
qsub: waiting for job 2236960.biobos to start
qsub: job 2236960.biobos ready
[user@p4]$ cd /data/user/myruns
[user@p4]$ module load bedtools
[user@p4]$ cd /data/userID/bedtools/run1
[user@p4]$ bamToBed -i input.bam >output.bed
[user@p4]$ ...........
[user@p4] exit
qsub: job 2236960.biobos completed
[user@biowulf]$
The qsub command above will allocate a node with at least 1 GB of memory. If you need more than that, you can specify the memory requirement on the qsub command line, e.g.
will allocate a node with 24 GB of memory.


