Biowulf at the NIH
RSS Feed
Random Jungle on Biowulf
Random Jungle was developed by Daniel F. Schwarz and provides a free random forest implementation for high dimensional data. It is intended to be widely useful, and usable across a broad spectrum of applications.

Random Jungle website.
On Safari to Random Jungle: A fast implementation of Random Forests for high dimensional data.. Schwarz DF, Konig IR, Ziegler A., Bioinformatics. 2010 May 26.

Usage

There are two executables: rjungle and rjunglesparse. Several versions of Random Jungle are available on Helix/Biowulf in /usr/local/apps/rjungle. The easiest way to select a particular version is to use the modules commands, as in the example below:

Note that v2.0.0 segfaults on the Biowulf computational nodes, which run a different version of the OS (Centos 5) than Helix. Thus, v2.0.0 should only be run on Helix.

[user@biowulf]$ module avail rjungle

module avail rjungle

----------------- /usr/local/Modules/3.2.9/modulefiles ---------------------------
rjungle/1.0.359               rjungle/1.3.0                 
rjungle/1.2.362-mpi           rjungle/1.3.0-mpi             rjungle/2.0.0-linux
rjungle/1.2.365               rjungle/2.0.0-mpi

[user@biowulf]$ module load rjungle/2.0.0-centos

[user@biowulf]$ module list
Currently Loaded Modulefiles:
  1) rjungle/2.0.0-centos

If you type 'module load rjungle' without specifying a version, the default version will be loaded.

To submit a swarm of randomjungle jobs, set up a swarm command script along the following lines:

#this file is rjswarm
cd /data/user/mydir1; rjungle [...options...]
cd /data/user/mydir2; rjunglesparse [...options...]
cd /data/user/mydir3; rjungle [...options...]
...
Submit this swarm with the command:
swarm -f rjswarm --module rjungle/

Submitting a single Random Jungle batch job

Sample batch script for a single randomjungle job:

#!/bin/bash

# load the latest (default) version of Random Jungle
module load rjungle

cd /data/user/mydir
rjungle [...options]

This job can be submitted with the command:

qsub -l nodes=1 rjscript

Submitting a batch job with Rjungle and R

This example uses the sample data that is provided with Rjungle.

Create a batch script along the following lines:

#!/bin/bash

module load rjungle
module load R

R --vanilla << EOF;
## File handling for Random Jungle
rjungleExe <- file.path("//usr/local/apps/rjungle//usr/local/apps/rjungle/rjungle-bin-pkg-1.2.365-i686-pc-linux-gnu/bin/rjungle")
pedFile <- file.path("/data/$USER/rjungle/gaw15.ped")
rjungleOutFile <- file.path("/data/$USER/rjungle/out")

## Run Random Jungle on PED file
rjungleCMD <- paste(rjungleExe,
"-f", pedFile,
"-v", ## show processing
"-p", ## read in pedFile
"-o", rjungleOutFile) ## out file path
try(system(rjungleCMD)) 
EOF

Submit this script with:

qsub -l nodes=1 rjungle.bat

The standard output from this job will look like:

R version 3.0.2 (2013-09-25) -- "Frisbee Sailing"
Copyright (C) 2013 The R Foundation for Statistical Computing
Platform: x86_64-unknown-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> ## File handling for Random Jungle
> rjungleExe <- file.path("//usr/local/apps/rjungle/rjungle-bin-pkg-1.2.365-i686-pc-linux-gnu/bin/rjungle")
> pedFile <- file.path("/data/susanc/rjungle/gaw15.ped")
> rjungleOutFile <- file.path("/data/susanc/rjungle/out")
> 
> ## Run Random Jungle on PED file
> rjungleCMD <- paste(rjungleExe,
+ "-f", pedFile,
+ "-v", ## show processing
+ "-p", ## read in pedFile
+ "-o", rjungleOutFile) ## out file path
> try(system(rjungleCMD)) 
Start: Thu Jan 16 10:09:12 2014

+---------------------+-----------------+-------------------+
|    Random Jungle    |       2.0.0     |        2013       |
+---------------------+-----------------+-------------------+
|     2008-2011 Daniel F Schwarz et al.,                    |
|     2011-2013 Jochen Kruppa et al.,                       |
|               jochen.kruppa@imbs-uni.luebeck.de           |
+-----------------------------------------------------------+
|     Source: http://www.randomjungle.de                    |
|     Help:   http://groups.google.com/group/randomjungle   |
+-----------------------------------------------------------+

Output to: /data/susanc/rjungle/out.*
Loading data... 
Read 3500 row(s) and 9193 column(s).
Use 3500 row(s) and 9189 column(s).
Dependent variable name: PHENOTYPE
Growing jungle...
Number of variables: 9189 (mtry = 95)
1 thread(s) growing 500 tree(s)
Growing time estimate: ~2 min.
progress: 10%
[...]
Generating and collecting output data...
Writing accuracy information...
Calculating confusion matrix...

Growing time: 164.14 sec
Elapsed time: 170 sec
Finished: Thu Jan 16 10:46:37 2014

Documentation

Random Jungle website