FusionMap is an efficient fusion aligner which aligns reads spanning fusion junctions directly to the genome without prior knowledge of potential fusion regions. It detects and characterizes fusion junctions at base-pair resolution. FusionMap can be applied to detect fusion junctions in both single- and paired-end dataset from either gDNA-Seq or RNA-Seq studies.
How To Use
First set your environment for using FusionMap:
module load fusionmap
Due to licensing restrictions, only the most current version of FusionMap can be made available. Click here for more information on modules.
FusionMap is run with an input configuration file. Examples are found in $FMBIN/../TestDataset. Text that follows the '//' are considered comments and are ignored by FusionMap. Here is an example:
Please note that some filepaths must obviously be changed...
Define a directory as the Base_Directory. The Base_Directory will hold downloaded or built reference and index files needed to run FusionMap. If FusionMap can't find the required files, it will attempt to download them from here. NOTE: Downloading will fail if attempted from a cluster node, as the Biowulf cluster is NOT connected to the internet. This step must be run on Helix.
The Helix Systems staff maintains a small number of commonly used reference libraries available in /fdb/fusionmap:
Reference Build Gene Model Human.B37 RefGene Human.B37 Ensembl.R70 Human.B37 UcscGene20130723 Human.B37.3 RefGene Human.B37.3 Ensembl.R73 Human.B37.3 UcscGene20130723 Human.hg19 RefGene Human.hg19 Ensembl.R73 Human.hg19 UcscGene20130723 Mouse.B38 RefGene Mouse.B38 Ensembl.R73 Mouse.B38 UcscGene20130723 Mouse.mm10 RefGene Mouse.mm10 Ensembl.R73 Mouse.mm10 UcscGene20130723
Please contact firstname.lastname@example.org if you would like an additional reference library installed.
NOTE on gene filters:If you build your own reference library, you will need copy a set of files into the Base_Directory for filtering the genes:
cp -R /fdb/fusionmap/Fusion /path/to/[user-defined Base_Directory]
where [user-defined Base_Directory] is your created reference library Base_Directory.
Once these things have been defined/created, run this command:
mono $FMBIN/FusionMap.exe --semap /path/to/Base_Directory [RefLib Name] [GeneModel Name] /path/to/input/configuration/file > run.log
where [RefLib Name] and [GeneModel Name] are substituted for the desired pair (see above).
Create an input configuration file, along the lines of the above examples.
Next, create a qsub script file. The file will contain the lines similar to the lines below. Modify the filepaths and reference file labels where appropriate before running.
#!/bin/bash # This file is YourOwnFileName # #PBS -N yourownfilename #PBS -m be #PBS -k oe module load fusionmap cd /data/user/somewhereWithInputConfigFile mono $FMBIN/FusionMap.exe --semap /path/to/Base_Directory Human.B37.3 RefGene /path/to/input/configuration/file > run.log
Submit the script using the 'qsub' command on Biowulf, e.g. Note, users are recommended to run benchmarks to determine what kind of node is suitable for his/her jobs.
[user@biowulf]$ qsub -l nodes=1 /data/username/theScriptFileAbove
freen: see http://biowulf.nih.gov/user_guide.html#freen
qstat: search for 'qstat' on http://biowulf.nih.gov/user_guide.html for it's usage.
jobload: search for 'jobload' on http://biowulf.nih.gov/user_guide.html for it's usage.
Using the 'swarm' utility, one can submit many jobs to the cluster to run concurrently.
Set up a swarm command file (eg /data/username/cmdfile). Here is a sample file:
cd /data/user/run1/; mono $FMBIN/FusionMap.exe .... [options] cd /data/user/run2/; mono $FMBIN/FusionMap.exe .... [options] ... cd /data/user/run10/; mono $FMBIN/FusionMap.exe .... [options]
These swarm options are important:
- -f: the swarm command file name above (required)
- --module: the module(s) required for the commands. In this case, you should include --module fusionmap in the swarm commandline.
- -t: number of processors per node to use for each line of the commands in the swarm file above.(optional)
- -g: GB of memory needed for each line of the commands in the swarm file above.(optional)
By default, each line of the commands above will be executed on '1' processor core of a node and uses 1GB of memory. If this is not what you want, you will need to specify '-t' and '-g' flags when you submit the job on biowulf.
For example, if each line of the commands above also will need to use 10gb of memory instead of the default 1gb of memory, make sure swarm understands this by including '-g 10' flag:
[user@biowulf]$ swarm -g 10 -f cmdfile --module fusionmap
For more information regarding running swarm, see swarm.html
The user may need to run jobs interactively sometimes. Such jobs should not be run on the Biowulf login node. Instead allocate an interactive node as described below, and run the interactive job there.
[user@biowulf] $ qsub -I -l nodes=1 qsub: waiting for job 2236960.biobos to start qsub: job 2236960.biobos ready [user@p4]$ cd /data/user/myruns [user@p4]$ module load fusionmap [user@p4]$ cd /data/userID/fusionmap/run1 [user@p4]$ mono $FMBiN/FusionMap.exe .... [options] [user@p4]$ ... [user@p4] exit qsub: job 2236960.biobos completed [user@biowulf]$
The user may add node properties in the qsub command to request a specific interactive node. For example, if you need a node with 24gb of memory to run job interactively, do this:
[user@biowulf]$ qsub -I -l nodes=1:g24:c16
This script automates the basic use of FusionMap, from building reference indices to aligning FASTQ files to the references. It uses an Ensembl build and annotation, but this can be modified.