Graphics processing units (GPUs) are specialized microprocessors originally designed for video and rendering. More recently, compute-intensive programs in the life sciences have been ported to GPUs to explore potential performance benefits of their massive compute power.
As part of its role to integrate new technologies into the production environment, the Biowulf staff has installed 16 GPU nodes into the cluster as a pilot project. The purpose of the pilot is to:
- Identify those applications currently available for use with GPUs; evaluate and integrate them into the Biowulf production environent
- Identify those user simulations which will most benefit from running on GPU systems
- Develop or port new applications to run on GPUs
- Determine the cost and energy effectiveness of using GPU technology
Note that running on the GPU nodes is not a guarantee of improved performance. It is vital to run your own benchmarks to determine the effectiveness of using the GPUs. The Biowulf staff is very interested in any GPU benchmarks; please let us know about them at firstname.lastname@example.org.
- 2 x Intel Xeon X5650 (2.67 GHz), each: 6 cores
- 2 x Nvidia M2050 (Tesla Fermi), each: 2.8 GB memory, 448 cores
- 48 GB DDR3 memory
- 450 GB disk (7200 rpm SATA)
- Each node is connected to two networks: QDR Infiniband (32 Gb/s) and 1 Gb/s ethernet.
- CentOS 5.5 Linux Operating System
- CUDA 4.1
- Nvidia driver 295.33
GPU nodes can be allocated using the "gpu2050" property:
% qsub -l nodes=1:gpu2050 gpujob.bat % qsub -l nodes=4:gpu2050 pjob.batInitial testing or compiling can be done with an interactive session:
% qsub -l nodes=1:gpu2050 -IInteractive sessions on gpu nodes have a maximum walltime of 24 hours, but please log out of interactive sessions as soon as you're finished with them.
There are 32 GPUs in the pilot cluster. Since it is possible (application dependent) to share the GPUs amongst processes running on the Intel Nehalem CPU cores, additional performance may be gained by running with CPU:GPU ratios of 2:1, 3:1 or more. (See the Biowulf NAMD GPU page for an example.) Distributed memory codes may also benefit by running over the Infiniband network instead of gigabit Ethernet.
CPU usage on the GPU nodes can be monitored using the jobload utility, as with other Biowulf batch jobs. There is currently no simple way to monitor the GPU usage.
Software packages that include GPU support via CUDA will likely have configuration or Makefile options for specifying the location of the CUDA SDK. On Biowulf, the default CUDA SDK (which includes the compilers, headers and run-time libraries) is located in /usr/local/CUDA/cuda-4.1/There are several other versions of CUDA available in /usr/local/CUDA. The easiest way to see available versions or use a specific version to build code is by using the module commands.
[user@biowulf ~]$ module avail cuda ----------------- /usr/local/Modules/3.2.9/modulefiles ---------------- cuda/2.3 cuda/3.0 cuda/3.1 cuda/4.0.17 cuda/4.1 cuda/5.0 [user@biowulf ~]$ module load cuda/5.0 [user@biowulf ~]$ module list Currently Loaded Modulefiles: 1) cuda/5.0
Alternatively, you can see the paths set by the module with, for example, 'module display cuda/5.0', and then set them as desired.
Individuals that wish to program their own applications or would like to add GPU support to existing applications will need to learn about how GPU-assisted processing works and will likely want to become familiar with NVIDIA's developer resources portal.
The current CUDA programming guide can be found here.
The architecture-specific (Fermi) GPU tuning guide can be found here.
NVIDIA Performance Primitives
NVIDIA distributes a set of library functions for accelerating processing of image and video data. NPP is installed in /usr/local/nvidia/NPP_SDK, for documentation and downloads you can visit the NVIDIA NPP page.