Programming Tools and Libraries
Biowulf is intended to run code programmed by our users as well as commercial and open-source codes that may need to be built for our platform(s) if they do not come in a useable binary format. Accordingly, we host a number of compilers and build environments to suit the needs of developers and individuals that need to build projects from source.
This page provides information specific to the Biowulf development environment as well as a rough overview of the various compilers, libraries and programs used on our system. The linked documentation on specific packages and programs will usually need to be consulted for any useful understanding of them.
All Biowulf cluster nodes include the GCC compiler suite which includes C, C++, FORTRAN77 and FORTRAN90/95 compilers (gcc, g++, g77 and gfortran respectively) along with the GNU debugger (gdb). In addition to these default compilers, there are two other popular suites available to Helix/Biowulf users that may improve the performance of your project or better accommodate certain code bases - these are currently the Intel and Portland Group International (PGI) compiler suites.
Each compiler suite has a listing which includes a chart that shows the location of a set-up script that will enable the compiler in your environment, lists common front-ends for each compiler and shows the locations of various MPI installations depending on target architecture and desired interconnect (see MPI section below for details on MPI installations). For instance, if you wanted to use the Intel compilers to build your project, and your current shell is bash, the following command would set up your environment:
% source /usr/local/intel/intelvars.sh Arch is i386. setting up for Intel C compiler version 10.1.018. setting up for Intel Fortran compiler version 10.1.018. setting up for Intel debugger verion 10.1.018.
The venerable GNU compiler suite is available in the user's PATH by default. This is the system default version that comes with the operating system. Though not considered "high-performance" or "optimized," they are usually the best choice for pre-existing source codes since build-systems are often created with this compiler in mind (indeed, a lot of code will run faster using GCC unless pains have been taken to include optimal flags for the various proprietary compilers). Consequently, sensible compiler flags are generated and building will be comparatively trouble-free with these build-systems. However, if performance is an issue, you should consult the documentation distributed with the source distribution that you're trying to build to see if other compilers are supported. If you're developing a high-performance application "in-house," you may want to explore the other compilers available on Biowulf.GCC quick-chart
Current Version: 4.1.2 Documentation: Try "man gcc" or "man g++" or "man g77" or "man gfortran". Primary front-ends:
C gcc C++ g++ Fortran77 g77 Fortran90/95 gfortran
Some users need the latest features offered by the latest release of the GNU compiler suite, so we also manually maintain recent 4.x releases. These newer GNU compilers require newer environment variables set before executing programs built with it. This is accomplished by loading the appropriate module. Alternatively, the user can specify "-static-libgcc", "-static-libgfortran" or "-static--libg++" (depending on the language) during the build phase to compile-in these runtimes and thus avoid needing to set different runtime paths. The MPI installations below can be used only after you have loaded the appropriate module.Latest GCC quick-chart
Current Version: 4.8.3 Module module load gcc/4.8.3 Documentation: Try "man gcc" or "man g++" or "man gfortran". Primary front-ends:
C gcc C++ g++ Fortran77 gfortran Fortran90/95 gfortran
To list all available non-system versions of the GCC compiler, run:module avail gcc
To revert back to the system GCC after loading a newer GCC version, run:module unload gcc
The Portland Group suite includes the usual set of C, C++ FORTRAN77 and FORTRAN90/95 compilers. Also included is an OpenMP implementation, preliminary support for FORTRAN2000 and PGDBG, a graphical debugger (see debugging section below).PGI quick-chart
Current Version: 14.1 Setup Scripts
bash /usr/local/pgi/pgivars14.sh csh/tcsh /usr/local/pgi/pgivars14.csh Documentation: PGI Compiler Documentation Primary front-ends:
C pgcc C++ pgCC Fortran77 pgf77 Fortran90 pgf90 Fortran95 pgf95
Older PGI compiler versions are also available:
bash source /usr/local/pgi/pgivars13.sh t/csh (C-shell) source /usr/local/pgi/pgivars13.csh 11.10
bash source /usr/local/pgi/pgivars11.sh t/csh (C-shell) source /usr/local/pgi/pgivars11.csh 10.3
bash source /usr/local/pgi/pgivars10.sh t/csh (C-shell) source /usr/local/pgi/pgivars10.csh
The Intel suite includes C, C+, FORTRAN77 and FORTRAN90/95 compilers along with OpenMP and the Intel debugger. Anecdotal evidence suggests that this compiler suite frequently provides the best performance for calculation-intensive applications. Included with these compilers are the Intel Math Kernel Library (MKL), LINPACK and Intel Performance Primitives (IPP) - all discussed in the scientific libraries section below.Intel quick-chart
Current Version: 2015.1.133 (2015 Update 1) Setup Command
bash source /usr/local/intel/intelvars15_u1.sh t/csh (C-shell) source /usr/local/intel/intelvars15_u1.csh Documentation: Intel Documentation Site Primary front-ends:
C icc C++ icpc Fortran77/90/95 ifort
Older Intel compiler versions are also available:
bash source /usr/local/intel/intelvars15.sh t/csh (C-shell) source /usr/local/intel/intelvars15.csh 2013.2.144
bash source /usr/local/intel/intelvars13_sp1.sh t/csh (C-shell) source /usr/local/intel/intelvars13_sp1.csh 2013.0.079
bash source /usr/local/intel/intelvars13.sh t/csh (C-shell) source /usr/local/intel/intelvars13.csh
Several Java Development Kits are installed in /usr/local/java. Older versions are available for applications that require them. The latest is usually the best choice.
- /usr/local/java/latest (32-bit JDK)
- /usr/local/java64/latest (64-bit JDK)
java [options] -jar jarfile
All java-based applications can utilize these options.
Specifying memory. Including these options will configure the amount of memory required to run the java-based application. [size] can be defined in kilobytes (e.g. 5k), megabytes (10m), or gigabytes (8g).
- -Xms[size] set initial heap size
- -Xmx[size] set maximum heap size
- -Xss[size] set thread stack size
It is very common to include -Xmx4g with calls to java. This requires that 4GB of memory is available to the java instance.
Specifying scratch space. Java-based applications will very often require a scratch space for creating temporary files during execution. By default, this is set to /tmp. Unfortunately, many genomic java applications require much more scratch space than is available in /tmp. Worse, running multiple instances of java on a single node may fill up /tmp. In this case, including the option
will configure java to use [TMPDIR] as a scratch space. Typically, this can be set to /scratch:
java -Djava.io.tmpdir=/scratch -jar jarfile
For more information about how to configure java-based applications, type
at the prompt, or see http://www.oracle.com/technetwork/java.
Parallel applications on Biowulf typically use MPI as the means of inter-process communication across our various network interconnects. MPI is an application programming interface specification that currently exists in two major versions: MPI1 and MPI2. These APIs are implemented by a number vendors and projects.
The Biowulf staff maintains some popular MPI implementations for the convenience of our users. OpenMPI is an excellent MPI implementation that covers all of the high-performance networks available on Biowulf (Infiniband, Infinipath and Gigabit Ethernet), MPICH is a very popular and mature implementation for message passing over Ethernet networks and MVAPICH is MPICH with an additional Infiniband network target.
Current version: 1.6
OpenMPI is an excellent MPI implementation with plenty of options and capabilities while being generally quite easy to use. A binary built using OpenMPI can be used on any of Biowulf's high-performance networks regardless of the target network used during build time. This is because the target network is chosen at run-time. For this reason, it is not possible to build static MPI binaries using the OpenMPI compiler wrappers. The best source for documentation on OpenMPI comes from the project website.
This chart shows all of the OpenMPI installations available on Biowulf sorted by compiler and target interconnect. In practice, only the target compiler matters during build time, any interconnect can be used after the binaries are built. The target interconnect is chosen during run-time by using the appropriate mpirun when launching a job.
|Target Compiler||Target Interconnect||OpenMPI installation root|
|Intel Compiler Suite||Ethernet||/usr/local/OpenMPI/current/intel/eth|
|Portland Group Compilers (PGI)||Ethernet||/usr/local/OpenMPI/current/pgi/eth|
Using OpenMPI requires the user to find the appropriate OpenMPI installation and then set his/her PATH accordingly. In this example we want to build with the PGI compilers and then run on Ethernet, Infinipath and Infiniband
[user@biowulf ~]% qsub -I -l nodes=1 qsub: waiting for job 2078457.biobos to start qsub: job 2078457.biobos ready # First set up the compiler we want to use: [user@node ~]% source /usr/local/pgi/pgivars.sh # Now set our PATH to an OpenMPI install intended for use with that compiler [user@node ~]% export PATH=/usr/local/OpenMPI/current/pgi/eth/bin:$PATH # compile programs with the OpenMPI wrappers [user@node ~]% mpif90 -o dothings mpi_fortran_src.f90
The resultant binary can be used on the various networks available to Biowulf by using the appropriate mpirun command. Note that you must include the full path to mpirun rather than letting your PATH variable find it for you. This is because OpenMPI uses the execution string to "find itself" and link the appropriate libraries. You can use the "which" command to make your shell spell out the path to mpirun if the correct mpirun is in your PATH.
# Ethernet [user@node ~]% /usr/local/OpenMPI/current/pgi/eth/bin/mpirun -n32 \ -machinefile $PBS_NODEFILE ~/dothings # Infiniband [user@node ~]% /usr/local/OpenMPI/current/pgi/ib/bin/mpirun -n32 \ -machinefile $PBS_NODEFILE ~/dothings # Infinipath [user@node ~]% /usr/local/OpenMPI/current/pgi/ipath/bin/mpirun -n32 \ -machinefile $PBS_NODEFILE ~/dothings # It's not really even necessary to use the same compiler installation # at run-time, only the target interconnect matters: [user@node ~]% /usr/local/OpenMPI/current/intel/ib/bin/mpirun -n32 \ -machinefile $PBS_NODEFILE ~/dothings
As usual, consult the OpenMPI documentation for complete information on using OpenMPI.
Current version: 1.3.2p1
MPICH2 is an implementation of MPI versions 1 and 2 for Ethernet networks developed largely at Argonne National Laboratories. This chart shows the MPICH2 installations available on Biowulf.
We are preparing to build an MPI project that uses Ethernet as its MPI network and we want to use the Intel compilers to create the binaries.
Arch is x86_64.
setting up for Intel C compiler version 10.1.018.
setting up for Intel Fortran compiler version 10.1.018.
setting up for Intel debugger verion 10.1.018.
[janeuser@p2 ~]$ export PATH=/usr/local/mpich2-intel/bin:$PATH
Now we can use the MPI wrappers in our PATH to build MPI programs:
For complete documentation on using MPICH2, consult the latest version of the MPICH2 user's guide here (Argonne National Laboratory's MPICH2 site).
Current version: 1.6
MVAPICH2 is a special version of MPICH with some additional target interconnects (Infiniband, iWARP and RoCE). On Biowulf, we care about the Infiniband target interconnect. The following chart shows the MVAPICH2 installations available on Biowulf by compiler:
|GNU Suite (gcc)||
|Portland Group (PGI)||
Example using MVAPICH2 with the Intel compilers:
% qsub -I -l nodes=1:ib qsub: waiting for job 2078457.biobos to start qsub: job 2078457.biobos ready % source /usr/local/intel/intelvars.sh % export PATH=/usr/local/mvapich2-intel/bin:$PATH % mpicxx -o dothings mpi_cpp_src.cpp
MVAPICH2 Documentation can be found here.
Listed here are a few of the more notable libraries and suites available to developers of scientific and/or high-performance software. These are mostly various implementations of BLAS, LAPACK, etc., however the developer should review each to find the one that best fits his/her needs.
FFTW is an popular open-source fast Fourier transform library. 32- and 64-bit versions of the library can be found here:/usr/local/fftw-2.1.5/
Then Intel Math Kernel Library (MKL) is a set of optimized and threadable math routines for scientific, engineering and financial applications. It includes BLAS, LAPACK, ScaLAPACK, FFTs, a vector math library and random number generators. It is provided with the Intel compiler suite.
AMD's implementation of several common math routine libraries: Full 1, 2 and 3 BLAS, LAPACK, FFTs and a number of routine sets specific to the ACML. These should run exceptionally well on Biowulf's Opteron nodes. AMD was kind enough to build the ACML for the major Fortran compiler suites. Installations are available here according to compiler:
Compiler ACML installation GNU (gfortran) /usr/local/acml/gfortran Intel (ifort) /usr/local/acml/ifort PGI (pgf77/90/95) /usr/local/acml/pgi
From Intel's website:
Integrated Performance Primitives (IntelÂ® IPP) is an extensive library of multi-core-ready, highly optimized software functions for multimedia data processing, and communications applications.
It includes, among other things, routines for audio/video encoding and decoding, image processing, signal processing, Vector/Matrix math and data compression. IPP is installed here according to target architecture:/usr/local/intel/ipp/current
This is the open-source scientific library provided by GNU for C and C++ developers. It provides a large array of math routines with an extensive test-suite. A complete list of routines and capabilities is available on the GSL website. Current version is 1.15./usr/local/GSL/[version]
To use, you need to include the bin and lib in your environment. This is most easily done by including the following in your ~/.bashrc file:export PATH=/usr/local/GSL/[version]/bin:$PATH export LD_LIBRARY_PATH=/usr/local/GSL/[version]/lib:$LD_LIBRARY_PATH
Debuggers and memory/thread profilers are often associated with a specific compiler to be used only with the accompanying compiler, however some may work across compiler suites as well. Also included here are a couple generic debugger/profilers.
GDB is part of the GNU project and is available on all nodes by default. Documentation is available on the GDB website and by typing "man gdb".
Valgrind is a common tool-chain for memory-profiling and debugging. Documentation is available on the Valgrind website or by typing "man valgrind".
The Intel debugger comes with the Intel compiler suite in 32- and 64-bit flavors. "idb" can be found in your PATH after sourcing the Intel compiler set-up script (see Intel Compilers)
See Intel docs linked above
The PGI compilers come with a graphical debugger and memory profiler (pgdbg). Using the GUI requires X to be installed on your workstation, however it will drop to a console-only version when X is not available. The debugger is present in your PATH after sourcing the appropriate PGI set-up script (see PGI Compilers above).
See PGI docs linked above
While not usually appropriate for high-performance calculations or distributed memory tasks, scripting languages can be very useful when managing jobs or processes at a higher level, sorting data or doing an infinite number of simple tasks. Biowulf includes many scripting languages which are made available by the operating system and by the Biowulf staff.
Please see http://helix.nih.gov/Applications/scripting.html for more information about scripting languages.