GMAP is a standalone program for mapping and aligning cDNA sequences to a genome. The program maps and aligns a single sequence with minimal startup time and memory requirements, and provides fast batch processing of large sequence sets. The program generates accurate gene structures, even in the presence of substantial polymorphisms and sequence errors, without using probabilistic splice site models.
GSNAP can align both single- and paired-end reads as short as 14 nt and of arbitrarily long length. It can detect short- and long-distance splicing, including interchromosomal splicing, in individual reads, using probabilistic models or a database of known splice sites. Our program also permits SNP-tolerant alignment to a reference space of all possible combinations of major and minor alleles, and can align reads from bisulfite-treated DNA for the study of methylation state. This program is developed by Thomas D. Wu etc.
There are prebuilt indexes for hg19 database under /fdb/gmap. If indexes for another database are needed, please contact staff@helix.nih.gov
The GMAP and GSNAP executables are most easily added to your path using the command 'module load gmap-gsnap', as in the example below.
$ module avail gmap-gsnap -------------------------------- /usr/local/Modules/3.2.9/modulefiles -------------------------------------------------------- gmap-gsnap/2012-01-11 gmap-gsnap/2012-04-27 gmap-gsnap/2012-05-24 gmap-gsnap/2012-06-06 gmap-gsnap/2012-07-20 gmap-gsnap/2012-03-21 gmap-gsnap/2012-05-07 gmap-gsnap/2012-06-02 gmap-gsnap/2012-06-20 gmap-gsnap/2012-07-20-v2 gmap-gsnap/2013-01-23 $ module load gmap-gsnap $ module list Currently Loaded Modulefiles: 1) gmap-gsnap/2013-01-23 $ module unload gmap-gsnap $ module load gmap-gsnap/2012-06-06 $ module list Currently Loaded Modulefiles: 1) gmap-gsnap/2012-06-06 $ module show gmap-gsnap ------------------------------------------------------------------ /usr/local/Modules/3.2.9/modulefiles/gmap-gsnap/2013-01-23: module-whatis Sets up gmap-gsnap 2013-01-23 prepend-path PATH /usr/local/apps/gmap-gsnap/2013-01-23/bin -------------------------------------------------------------------
1. Create a script file. The file will contain the lines similar to the lines below.
#!/bin/bash # This file is gmapscript # #PBS -N gmap #PBS -m be #PBS -k oe module load gmap-gsnap cd /data/user/mydir gmap -D /fdb/gmap/hg19 -d hg19 cdna1234.fa
3. Submit the script using the 'qsub' command on Biowulf.
qsub -l nodes=1 /data/username/gmapscript
Using the 'swarm' utility, one can submit many jobs to the cluster to run concurrently.
Set up a swarm command file (eg /data/username/cmdfile). Here is a sample file that maps a set of cDNAs in files to a genome.
module load gmap-gsnap; cd /data/user/mydir; gmap -D /fdb/gmap/hg19 -d hg19 cdna1.fa module load gmap-gsnap; cd /data/user/mydir; gmap -D /fdb/gmap/hg19 -d hg19 cdna2.fa module load gmap-gsnap; cd /data/user/mydir; gmap -D /fdb/gmap/hg19 -d hg19 cdna3.fa [...]
Submit this swarm with
swarm -f cmdfile
By default, each line of the commands above will be executed on '1' processor core of a node and can use up to 1GB of memory. If each GMAP command requires more than 1 GB of memory, you need to specify the required memory to swarm using the -g # flag, where # is the number of GB of memory required. For example, if each gmap command above requires 3 GB of memory, submit with:
swarm -g 3 -f cmdfile
For more information regarding running swarm, see swarm.html
Users may need to run jobs interactively sometimes. Such jobs should not be run on the Biowulf login node. Instead allocate an interactive node as described below, and run the interactive job there.
biowulf% qsub -I -l nodes=1
qsub: waiting for job 2236960.biobos to start
qsub: job 2236960.biobos ready
[user@p4]$ cd /data/user/myruns
[user@p4]$ /usr/local/gmap-snap/bin/gmap -D /fdb/gmap/hg19 -d hg19 myfastafile
[user@p4]$ [other commands...........]
[user@p4]$ exit
qsub: job 2236960.biobos completed
[user@biowulf ~]$
You can add a node property in the qsub command to request a specific kind of interactive node. For example, if you need a node with 24gb of memory to run job interactively, do this:
biowulf% qsub -I -l nodes=1:g24:c16


