Biowulf at the NIH
RSS Feed
FusionMap on Biowulf

FusionMap is an efficient fusion aligner which aligns reads spanning fusion junctions directly to the genome without prior knowledge of potential fusion regions. It detects and characterizes fusion junctions at base-pair resolution. FusionMap can be applied to detect fusion junctions in both single- and paired-end dataset from either gDNA-Seq or RNA-Seq studies.

How To Use

First set your environment for using FusionMap:

module load fusionmap

Due to licensing restrictions, only the most current version of FusionMap can be made available. Click here for more information on modules.

FusionMap is run with an input configuration file. Examples are found in $FMBIN/../TestDataset. Text that follows the '//' are considered comments and are ignored by FusionMap. Here is an example:

Please note that some filepaths must obviously be changed...

Define a directory as the Base_Directory. The Base_Directory will hold downloaded or built reference and index files needed to run FusionMap. If FusionMap can't find the required files, it will attempt to download them from here. NOTE: Downloading will fail if attempted from a cluster node, as the Biowulf cluster is NOT connected to the internet. This step must be run on Helix.

The Helix Systems staff maintains a small number of commonly used reference libraries available in /fdb/fusionmap:

Reference Build Gene Model
Human.B37RefGene
Human.B37Ensembl.R70
Human.B37UcscGene20130723
Human.B37.3RefGene
Human.B37.3Ensembl.R73
Human.B37.3UcscGene20130723
Human.hg19RefGene
Human.hg19Ensembl.R73
Human.hg19UcscGene20130723
Mouse.B38RefGene
Mouse.B38Ensembl.R73
Mouse.B38UcscGene20130723
Mouse.mm10RefGene
Mouse.mm10Ensembl.R73
Mouse.mm10UcscGene20130723

Please contact staff@helix.nih.gov if you would like an additional reference library installed.

NOTE on gene filters:If you build your own reference library, you will need copy a set of files into the Base_Directory for filtering the genes:

cp -R /fdb/fusionmap/Fusion /path/to/[user-defined Base_Directory]

where [user-defined Base_Directory] is your created reference library Base_Directory.

Once these things have been defined/created, run this command:

mono $FMBIN/FusionMap.exe --semap /path/to/Base_Directory [RefLib Name] [GeneModel Name] /path/to/input/configuration/file > run.log

where [RefLib Name] and [GeneModel Name] are substituted for the desired pair (see above).

Submitting a single batch job

Create an input configuration file, along the lines of the above examples.

Next, create a qsub script file. The file will contain the lines similar to the lines below. Modify the filepaths and reference file labels where appropriate before running.

#!/bin/bash
# This file is YourOwnFileName
#
#PBS -N yourownfilename
#PBS -m be
#PBS -k oe

module load fusionmap
cd /data/user/somewhereWithInputConfigFile
mono $FMBIN/FusionMap.exe --semap /path/to/Base_Directory Human.B37.3 RefGene /path/to/input/configuration/file > run.log

Submit the script using the 'qsub' command on Biowulf, e.g. Note, users are recommended to run benchmarks to determine what kind of node is suitable for his/her jobs.

[user@biowulf]$ qsub -l nodes=1 /data/username/theScriptFileAbove

Useful commands:

freen: see http://biowulf.nih.gov/user_guide.html#freen

qstat: search for 'qstat' on http://biowulf.nih.gov/user_guide.html for it's usage.

jobload: search for 'jobload' on http://biowulf.nih.gov/user_guide.html for it's usage.

Submitting a swarm of jobs

Using the 'swarm' utility, one can submit many jobs to the cluster to run concurrently.

Set up a swarm command file (eg /data/username/cmdfile). Here is a sample file:

cd /data/user/run1/; mono $FMBIN/FusionMap.exe .... [options]
cd /data/user/run2/; mono $FMBIN/FusionMap.exe .... [options]
...
cd /data/user/run10/; mono $FMBIN/FusionMap.exe .... [options]

These swarm options are important:

By default, each line of the commands above will be executed on '1' processor core of a node and uses 1GB of memory. If this is not what you want, you will need to specify '-t' and '-g' flags when you submit the job on biowulf.

For example, if each line of the commands above also will need to use 10gb of memory instead of the default 1gb of memory, make sure swarm understands this by including '-g 10' flag:

[user@biowulf]$ swarm -g 10 -f cmdfile --module fusionmap

For more information regarding running swarm, see swarm.html

Running an interactive job

The user may need to run jobs interactively sometimes. Such jobs should not be run on the Biowulf login node. Instead allocate an interactive node as described below, and run the interactive job there.

[user@biowulf] $ qsub -I -l nodes=1
qsub: waiting for job 2236960.biobos to start
qsub: job 2236960.biobos ready
      
[user@p4]$ cd /data/user/myruns
[user@p4]$ module load fusionmap
[user@p4]$ cd /data/userID/fusionmap/run1
[user@p4]$ mono $FMBiN/FusionMap.exe .... [options]
[user@p4]$ ...
[user@p4] exit
qsub: job 2236960.biobos completed
[user@biowulf]$

The user may add node properties in the qsub command to request a specific interactive node. For example, if you need a node with 24gb of memory to run job interactively, do this:

[user@biowulf]$ qsub -I -l nodes=1:g24:c16
Sample Script

This script automates the basic use of FusionMap, from building reference indices to aligning FASTQ files to the references. It uses an Ensembl build and annotation, but this can be modified.

Documentation

http://www.omicsoft.com/fusionmap/