A collection of various Perl scripts that utilize BioPerl modules for use in bioinformatics analysis. Tools are included for processing microarray data, next generation sequencing data, data file format conversion, querying datasets, and general high level analysis of datasets.
This tool box of programs relies on storing genome annotation, microarray, and next generation sequencing data in local BioPerl databases, allowing for data retrieval relative to any annotated feature in the database. While referencing genomic annotation and features from a database are convenient, they are not required. Simple Bed style input files are also supported for data collection.
The environment variable(s) need to be set properly first. The easiest way to do this is by using the modules commands as in the example below.
biowulf% module avail biotoolbox ---------- /usr/local/Modules/3.2.9/modulefiles -------------------- biotoolbox/1.8.0 biotoolbox/1.8.6 biotoolbox/1.9.4 biowulf% module load biotoolbox biowulf% module list Currently Loaded Modulefiles: 1) biotoolbox/1.9.4
1. Create a batch script along the lines of the one below:
#!/bin/bash # This file is FileName # #PBS -N RunName #PBS -m be #PBS -k oe module load biotoolbox cd /data/user/somewhereWithInputfile get_datasets.pl --db hg19 --feature gene --data /path/to/my/data.bam --method sum --value count --out gene_count bam2wig.pl --rpm --in data.bam bam2gff_bed.pl --bed --pe --in data.bam
3. Submit the script using the 'qsub' command on Biowulf.
Using the 'swarm' utility, one can submit many jobs to the cluster to run concurrently.
Set up a swarm command file (eg /data/username/cmdfile). Here is a sample file:
bam2gff_bed.pl --bed --pe --in data1.bam bam2gff_bed.pl --bed --pe --in data2.bam bam2gff_bed.pl --bed --pe --in data3.bam [...]
Submit this swarm with:
swarm -f swarmfile --module biotoolbox
By default, each line of the commands above will be executed on '1' processor core of a node and uses 1GB of memory. If each Biotoolbox command requires more than 1 GB of memory, you can specify the memory required using the '-g' flag. For example, if each command requires 10 GB of memory, submit with:
swarm -g 10 -f swarmfile --module biotoolbox
For more information regarding running swarm, see swarm.html
Users may need to run jobs interactively sometimes. Such jobs should not be run on the Biowulf login node. Instead allocate an interactive node as described below, and run the interactive job there.
qsub: waiting for job 2236960.biobos to start
qsub: job 2236960.biobos ready
[user@p4]$ cd /data/user/myruns
[user@p4]$ module load biotoolbox
[user@p4]$ cd /data/user/somewhereWithInputfile
[user@p4]$ get_datasets.pl --db hg19 --feature gene --data /path/to/my/data.bam --method sum --value count --out gene_count
[user@p4]$ bam2wig.pl --rpm --in data.bam
[user@p4]$ bam2gff_bed.pl --bed --pe --in data.bam
[user@p4] exit
qsub: job 2236960.biobos completed
[user@biowulf ~]$
Users may add a node property in the qsub command to request a specific kind of interactive node. For example, if you need a node with 8gb of memory to run job interactively, do this:
biowulf% qsub -I -l nodes=1:g8
Documentation


