![]() |
|
||
| |
|||
![]() R on BiowulfR (the R Project) is a language and environment for statistical computing and graphics. R is similar to the award-winning S system, which was developed at Bell Laboratories by John Chambers et al. It provides a wide variety of statistical and graphical techniques (linear and nonlinear modelling, statistical tests, time series analysis, classification, clustering, ...).R is designed as a true computer language with control-flow constructions for iteration and alternation, and it allows users to add additional functionality by defining new functions. For computationally intensive tasks, C, C++ and Fortran code can be linked and called at run time.
Submitting R jobsNOTE: R is not a parallel program. It is single-threaded, which means that it can only be run on 1 processor. Single, serial jobs are best run on your desktop machine or on Helix. There are two situations in which it is an advantage to run R on Biowulf:
For basic information about setting up an R job, see the R documentation listed at the end of this page. Also see the Batch Queuing System in the Biowulf user guide. Create a script such as the following:
script file /home/username/runR
--------------------------------------------------------------------------
#!/bin/tcsh
# This file is runR
#
#PBS -N R
#PBS -m be
#PBS -k oe
date
/usr/local/bin/R --vanilla < /data/username/R/Rtest.r > /data/username/R/Rtest.out
--------------------------------------------------------------------------
Submit the script using the 'qsub' command, e.g. qsub -v -l nodes=1 /home/username/runR
Running a 'swarm' of R jobsThe swarm program is a convenient way to submit large numbers of jobs. Create a swarm command file containing a single job on each line, e.g.
swarm command file /home/username/Rjobs
--------------------------------------------------------------------------
/usr/local/bin/R --vanilla < /data/username/R/R1 > /data/username/R/R1.out
/usr/local/bin/R --vanilla < /data/username/R/R2 > /data/username/R/R2.out
/usr/local/bin/R --vanilla < /data/username/R/R3 > /data/username/R/R3.out
/usr/local/bin/R --vanilla < /data/username/R/R4 > /data/username/R/R4.out
/usr/local/bin/R --vanilla < /data/username/R/R5 > /data/username/R/R5.out
....
--------------------------------------------------------------------------
Submit this by typing:
swarm -f /home/username/RjobsSwarm will run 2 jobs per node, since the Biowulf nodes are all 2-processor. See the Swarm documentation for more information.
Rmpi and snowRmpi is a wrapper to the LAM implementation of MPI. [Rmpi documentation].The package snow (Simple Network of Workstations) implements a simple mechanism for using a workstation cluster for ``embarrassingly parallel'' computations in R. [snow documentation] Users who wish to use Rmpi and SNOW will need to add the path for LAM into their .cshrc or .bashrc files, as below: setenv PATH /usr/local/etc:/usr/local/lam/bin:$PATH (for csh or tcsh) PATH=/usr/local/lam/bin:$PATH (for bash) To run Rmpi on multiple nodes, LAM must be started on those nodes with the lamboot command before Rmpi is loaded. Any spawned Rmpi slaves must be shut down with mpi.close.Rslaves() or mpi.quit() before exiting R, and lamhalt must be run to shut down LAM before exiting the batch job. Sample Rmpi batch script: ------- this file is myscript.bat-------------------------- #!/bin/csh #PBS -j oe cd $PBS_O_WORKDIR lamboot $PBS_NODEFILE /usr/local/bin/R --vanilla > myrmpi.out <<EOF library(Rmpi) mpi.spawn.Rslaves(nslaves=$np) mpi.remote.exec(mpi.get.processor.name()) n <- 3 mpi.remote.exec(double, n) mpi.quit() EOF lamhalt -------------------------------------------------------------- Sample batch script using snow:
------- this file is myscript.bat--------------------------
#!/bin/csh
#PBS -j oe
cd $PBS_O_WORKDIR
lamboot $PBS_NODEFILE
/usr/local/bin/R --vanilla > myrmpi.out <<EOF
library(snow)
cl <- makeCluster($np, type = "MPI")
clusterCall(cl, function() Sys.info()[c("nodename","machine")])
clusterCall(cl, runif, $np)
stopCluster(cl)
mpi.quit()
EOF
lamhalt
--------------------------------------------------------------
Either of the above scripts could be submitted with: qsub -v np=4 -l nodes=2 myscript.batNote that it is entirely up to the user to run the appropriate number of processes for the nodes requested. In the example above, the $np variable is set to 4 and exported via the qsub command, and this variable is used in the script to run 4 snow processes on 2 dual-cpu nodes. Production runs should be run with batch as above, but for testing purposes an occasional interactive run may be useful. Sample interactive session with Rmpi: (user input in bold)
Sample interactive session with snow: (user input in bold)
Documentation
|
|||
| This
document is available as http://biowulf.nih.gov/apps/r.html Biowulf home page | Helix Systems | NIH |
|||