Biowulf at the NIH
RSS Feed
DeFuse on Biowulf

deFuse is a software package for gene fusion discovery using RNA-Seq data. The software uses clusters of discordant paired end alignments to inform a split read alignment analysis for finding fusion boundaries. The software also employs a number of heuristic filters in an attempt to reduce the number of false positives and produces a fully annotated output for each predicted fusion. Defuse algorithm is published here and its usage to discover gene fusions in tumour samples is here.

Important Note

Defuse data files are under /fdb/defuse
/usr/local/apps/defuse/current/scripts/config_hg19.txt and config_hg18.txt can be copied to user's area and customized.

The environment variable(s) need to be set properly first. The easiest way to do this is by using the modules commands as in the example below.

[user@biowulf]$ module avail defuse
----------------------------- /usr/local/Modules/3.2.9/modulefiles --------------------------
defuse/0.4.3          defuse/0.6.0(default)

[user@biowulf]$ module load defuse

[user@biowulf]$ module list
Currently Loaded Modulefiles:
1) defuse/0.6.0 [user@biowulf]$ module unload defuse [user@biowulf]$ module load defuse/0.4.3 [user@biowulf]$ module list Currently Loaded Modulefiles: 1) defuse/0.4.3 [user@biowulf]$ module show defuse ------------------------------------------------------------------- /usr/local/Modules/3.2.9/modulefiles/defuse/0.6.0: module-whatis Sets up defuse 0.6.0 prepend-path PATH /usr/local/apps/defuse/0.6.0/scripts -------------------------------------------------------------------

Running a batch job

First create your own config.txt file. Two config files have already been created for hg18 and hg19 under /usr/local/apps/defuse/current/scripts. They are called config_hg18.txt & config_hg19.txt. Users can copy the file to their own directories and modify it if needed.

Create a script that you plan to use for DeFuse alone the following lines:

# This file is runDeFuse
#PBS -N DeFuse
#PBS -m be
#PBS -k oe

module load defuse
cd /data/$USER/defuse/run1 -c /YourPath/config_hg19.txt -1 input1 -2 input2 -o YourOutputFileDirectory -p 24

Note, 16 threads (-p 24) is specified here if job is to be submitted to node with 24 processors (for example, g24:c24 , see below). If you use different node, you need to change it. Run 'freen' on biowulf to check the core numbers of different nodes.

Finally, you submit this file on biowulf:

biowulf> $ qsub -l nodes=1:g24:c24 /data/$USER/defuse/run1/runDeFuse

Users may need to use other kinds of nodes depending on their needs. Please refer to user guide for available nodes currently.


Running an interactive job

User may need to run jobs interactively sometimes. Such jobs should not be run on the Biowulf login node. Instead allocate an interactive node as described below, and run the interactive job there.

[user@biowulf] $ qsub -I -l nodes=1:g24:c16
qsub: waiting for job 2236960.biobos to start
qsub: job 2236960.biobos ready

[user@pxxxx]$ module load defuse
[user@pxxxx]$ -c /YourPath/config_hg19.txt -1 input1 -2 input2 -o YourOutputFileDirectory -p 16
[user@pxxxx]$ ...........
[user@pxxxx] exit
qsub: job 2236960.biobos completed

In this example, '-p 16' was used because the required g24:c16 node has 16 cores and therefore can run '-p 16'. To request other kind of node, say g8 with 8gb of memory, do this:

[user@biowulf]$ qsub -I -l nodes=1:g8

Submitting a swarm of jobs

Using the 'swarm' utility, one can submit many jobs to the cluster to run concurrently.

Set up a swarm command file (eg /data/username/cmdfile). Here is a sample file: -c /YourPath/config_hg19.txt -1 input1 -2 input2 -o YourOutputFileDirectory1 -p 4 -c /YourPath/config_hg19.txt -1 input1 -2 inptu2 -o YourOutputFileDirectory2 -p 4

Note, 4 processors is specified here if job is to be submitted to node with 4 processors (for example, g8 , see below). If you use different node, you need to change it.

There are one flag of swarm that's required '-f' and two other flags of swarm user most possibly needs to specify when submit a swarm job: '-t' and '-g'.

-f: the swarm command file name above (required)
-t: number of processors per node to use for each line of the commands in the swarm file above.(optional)
-g: GB of memory needed for each line of the commands in the swarm file above.(optional)

By default, each line of the commands above will be executed on '1' processor core of a node and uses 1GB of memory. If this is not what you want, you will need to specify '-t' and '-g' flags when you submit the job on biowulf. The example above use 4 processors each command, so '-t 4' is needed when you submit the swarm job.

Say if each line of the commands above also will need to use 4gb of memory instead of the default 1gb of memory, make sure swarm understands this by including '-g 4' flag:

[user@biowulf]$ swarm -g 4 -t 4 -f cmdfile

For more information regarding running swarm, see swarm.html