Biowulf at the NIH
RSS Feed
GMAP/GSNAP on Biowulf

GMAP is a standalone program for mapping and aligning cDNA sequences to a genome. The program maps and aligns a single sequence with minimal startup time and memory requirements, and provides fast batch processing of large sequence sets. The program generates accurate gene structures, even in the presence of substantial polymorphisms and sequence errors, without using probabilistic splice site models.

GSNAP can align both single- and paired-end reads as short as 14 nt and of arbitrarily long length. It can detect short- and long-distance splicing, including interchromosomal splicing, in individual reads, using probabilistic models or a database of known splice sites. Our program also permits SNP-tolerant alignment to a reference space of all possible combinations of major and minor alleles, and can align reads from bisulfite-treated DNA for the study of methylation state. This program is developed by Thomas D. Wu etc.

There are prebuilt indexes for hg19 database under /fdb/gmap. If indexes for another database are needed, please contact staff@helix.nih.gov

The GMAP and GSNAP executables are most easily added to your path using the command 'module load gmap-gsnap', as in the example below.

$ module avail gmap-gsnap
-------------------------------- /usr/local/Modules/3.2.9/modulefiles --------------------------------------------------------
gmap-gsnap/2012-01-11    gmap-gsnap/2012-04-27    gmap-gsnap/2012-05-24    gmap-gsnap/2012-06-06    gmap-gsnap/2012-07-20
gmap-gsnap/2012-03-21    gmap-gsnap/2012-05-07    gmap-gsnap/2012-06-02    gmap-gsnap/2012-06-20    gmap-gsnap/2012-07-20-v2
gmap-gsnap/2013-01-23

$ module load gmap-gsnap

$ module list
Currently Loaded Modulefiles:
  1) gmap-gsnap/2013-01-23

$ module unload gmap-gsnap

$ module load gmap-gsnap/2012-06-06
$ module list
Currently Loaded Modulefiles:
  1) gmap-gsnap/2012-06-06

$ module show gmap-gsnap
------------------------------------------------------------------
/usr/local/Modules/3.2.9/modulefiles/gmap-gsnap/2013-01-23:

module-whatis    Sets up gmap-gsnap 2013-01-23
prepend-path     PATH /usr/local/apps/gmap-gsnap/2013-01-23/bin
-------------------------------------------------------------------

Submitting a single batch job

1. Create a script file. The file will contain the lines similar to the lines below.

#!/bin/bash
# This file is gmapscript
#
#PBS -N gmap
#PBS -m be
#PBS -k oe

module load gmap-gsnap

cd /data/user/mydir
gmap -D /fdb/gmap/hg19 -d hg19  cdna1234.fa

3. Submit the script using the 'qsub' command on Biowulf.

qsub -l nodes=1 /data/username/gmapscript

Submitting a swarm of jobs

Using the 'swarm' utility, one can submit many jobs to the cluster to run concurrently.

Set up a swarm command file (eg /data/username/cmdfile). Here is a sample file that maps a set of cDNAs in files to a genome.

module load gmap-gsnap; cd /data/user/mydir; gmap -D /fdb/gmap/hg19 -d hg19  cdna1.fa 
module load gmap-gsnap; cd /data/user/mydir; gmap -D /fdb/gmap/hg19 -d hg19  cdna2.fa 
module load gmap-gsnap; cd /data/user/mydir; gmap -D /fdb/gmap/hg19 -d hg19  cdna3.fa 
[...]

Submit this swarm with

swarm -f cmdfile

By default, each line of the commands above will be executed on '1' processor core of a node and can use up to 1GB of memory. If each GMAP command requires more than 1 GB of memory, you need to specify the required memory to swarm using the -g # flag, where # is the number of GB of memory required. For example, if each gmap command above requires 3 GB of memory, submit with:

swarm -g 3 -f cmdfile

For more information regarding running swarm, see swarm.html

Running an interactive job

Users may need to run jobs interactively sometimes. Such jobs should not be run on the Biowulf login node. Instead allocate an interactive node as described below, and run the interactive job there.

biowulf% qsub -I -l nodes=1
qsub: waiting for job 2236960.biobos to start
qsub: job 2236960.biobos ready
      
[user@p4]$ cd /data/user/myruns
[user@p4]$ /usr/local/gmap-snap/bin/gmap -D /fdb/gmap/hg19 -d hg19 myfastafile
 
[user@p4]$ [other commands...........]

[user@p4]$ exit 
qsub: job 2236960.biobos completed
[user@biowulf ~]$

You can add a node property in the qsub command to request a specific kind of interactive node. For example, if you need a node with 24gb of memory to run job interactively, do this:

biowulf% qsub -I -l nodes=1:g24:c16
Documentation

http://research-pub.gene.com/gmap/