Biowulf at the NIH
RSS Feed
Hotspot on Biowulf

Hotspot is a program for identifying regions of local enrichment of short-read sequence tags mapped to the genome using a binomial distribution model. Regions flagged by the algorithm are called "hotspots." The algorithm utilizes a local background model that automatically normalizes for large regions of elevated tag levels due to, for example, copy number effects. Hotpsot is otherwise able to detect regions of enrichment of highly-variable size, making it applicable to both broad and highly-punctate signals.

Hotspot was developed by John et al at the NIH and the University of Washington, Seattle. [Hotspot paper]

Setting up your enviroment

The easiest way to set up the environment appropriately is to use enviroment modules by typing

module load hotspot
on the command line or in your batch script.

Running a single hotspot batch job

The example below uses the sample pipeline scripts in $HOTSPOT_TEST_DIR, an environment variable which is set up by loading the module. The script runhotspot in that directory has been modified to have the correct paths for Biowulf.

First copy the file runall.tokens.txt to your own working directory. This file contains parameters, input files etc. that you will want to modify for your own jobs.

biowulf% module load hotspot
biowulf% cp $HOTSPOT_TEST_DIR/runall.tokens.txt .
biowulf% cp $HOTSPOT_TEST_DIR/runhotspot .

Edit this file to set the parameters as you like. Then create a batch script along the following lines:

#!/bin/bash
# this file is called hotspot.bat

module load hotspot
cd /data/username/mydir

./runhotspot

Submit this job with

qsub -l nodes=1 hotspot.bat
If you need more than the default 1 GB of memory, pick a node with the appropriate amount of memory and specify that on the qsub command line. You can use the 'freen' command to see available nodes and types. For example, if your hotspot job requires 20 GB of memory, submit with

qsub -l nodes=1:g24 hotspot.bat

If you don't know how much memory your job will require, it's probably reasonable to submit a single job to a 24 GB node. The standard output from the job will report the amount of memory used, and you can use that information for future jobs.

Documentation

Hotspot website.

Hotspot README file with details.