![]() |
|
The NIH Biowulf cluster is a Beowulf parallel processing system designed and built at the National Institutes of Health. Managed by the Helix Systems Staff, Biowulf consists of a main login/administrative node and 1900 compute nodes (4800+ processors) running the Linux operating system. The computational nodes are interconnected by a high speed network and have access to four high-performance NFS RAID fileservers.
Note: For FY2006, some nodes were funded by the National Cancer Institute (NCI). For FY2004, the cluster was partially funded by the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) and the National Human Genome Research Institute (NHGRI). NHGRI also partially funded the cluster for FY2003.
Please send comments or problem reports to staff@biowulf.nih.gov.
This study utilized the high-performance computational capabilities of the Biowulf Linux cluster at the National Institutes of Health, Bethesda, Md. (http://biowulf.nih.gov).
Chief, Helix Systems Staff
NIH
12 Center Drive
Building 12B, Rm 2N207
Bethesda, Md.
20892-5680
Logging in
The Biowulf system is accessed from the NIH network via ssh
to biowulf.nih.gov. This login node runs a secure
shell daemon, and users are
encouraged to login to the cluster using ssh.
Access to
any of the processing nodes is gained from biowulf.nih.gov using
rlogin. The node names are assigned as p#, where # is the node number. Note: Any login session to a compute node which has not been allocated to you will be automatically killed by the batch system.
Xwindows connection software will usually provide better terminal emulation, and is necessary for any graphics connections from the login or computational nodes (e.g. x-povray or interactive SAS). Unix workstations (including Macs with OS X) can handle Xwindows by default. We provide Free Xwindows emulation software for Macs and PCs to Helix Systems users.
NIH password policy requires that users change their password every 180 days. Whenever you reset your password, the 180 day expiration is also reset. For 2 weeks before your password expires, you will be warned when you log in to Biowulf.
To change your default shell, use the command 'chsh -s shell_name', where shell_name is the full pathname of the desired shell (e.g., /bin/csh).
If you do not read mail on helix, be sure to set up a .forward file in your helix home directory.
You can communicate with the Biowulf/Helix staff by sending email to staff@biowulf.nih.gov
The Biowulf cluster is a distributed memory parallel computer consisting of a total of over 4800 Opteron and Xeon processors with an aggregate floating-point performance of 18 TFLOPS. One node is a 32-processor/96 GB SGI Altix 300 system. This special node should be used only for jobs which require more than 4 GB memory. See the Firebolt page (http://biowulf.nih.gov/firebolt.html) for information on submitting jobs to the Altix system. The table below lists the current hardware configuration of the nodes.
| # of nodes | processors per node | memory | network |
| 232 |
4 x 2.8 GHz
AMD Opteron 290 1 MB secondary cache |
8 GB | 1 Gb/s ethernet |
| 471 |
2 x 2.8 GHz
AMD Opteron 254 1 MB secondary cache |
40 x 8 GB 226 x 4 GB 91 x 2 GB |
1 Gb/s ethernet 160 x 8 Gb/s Infiniband |
| 289 | 4 x 2.6 GHz
AMD Opteron 285 1 MB secondary cache |
8 GB | 1 Gb/s ethernet |
| 389 | 2 x 2.2 GHz
AMD Opteron 248 1 MB secondary cache |
129 x 2 GB 66 x 4 GB |
1 Gb/s ethernet 80 x 2 Gb/s Myrinet |
| 91 | 2 x 2.0 GHz
AMD Opteron 246 1 MB secondary cache |
48 x 1 GB 43 x 2 GB |
1 Gb/s ethernet 48 x 2 Gb/s Myrinet |
| 390 | 2 x 2.8 GHz
Xeon 512 kB secondary cache |
119 x 1 GB 207 x 2 GB 64 x 4 GB |
64 x 2 Gb/s Myrinet 100 Mb/s ethernet |
| 1 | 32 x 1.4 GHz
SGI Itanium 2 |
96 GB | 1 Gb/s ethernet |
In addition, for those applications that can benefit from the increased bandwidth and low latency, there are two very high performance switched networks available: 184 dual-processor nodes have an additional connection to Infiniband using QLogic InfiniPath adapters and connected to a Voltaire ISR 9288 switch. Infiniband provides 10 Gb/s bandwidth and the adapters connect directly to the Opteron HyperTransport bus allowing for very low latencies. In addition, 165 dual-processor nodes have a second connection to Myrinet. This high performance network has a bandwidth of 2 Gb/s with a latency of around 10uS. The nodes are interconnected by two 128-port Myricom Clos-128 switches.
The login node (biowulf.nih.gov) is an HP/Compaq Proliant ML370 (dual-processor 3.2 GHz Xeon) with 4 GB RAM, Compaq SA 5300 RAID-1 Controller, 36GB 15k rpm disk drives, and 3 additional Syskonnect Gigabit Ethernet interfaces (64-bit/66-MHz).
| Location | Creation | Backups | Performance | Amount of Space | Accessible from (*) |
|
| /home | network (NFS) | with Biowulf account | yes | high | 500 MB (quota)(**) |
B,C,D |
| /scratch (nodes) | local | created by user | no | best? (***) | 30 - 130 GB
dedicated while node is allocated |
C |
| /scratch (biowulf) | network (NFS) | created by user | no | low | 120 GB shared | B,H,N,D |
| /data | network (NFS) | with Biowulf account | yes | high | based on quota (48 GB default) |
B,C,H,N,D |
Note: your /data directory is actually physically located on on filesystems named /data1 through /data6. The /data directory consists of links to one of those filesystems. Please refer to your data directory through the /data links as opposed to the physical location because the physical location is subject to change. In other words, use /data/yourname rather than (for example) /data6/b/yourname in your scripts.
You will still be able to access migrated files exactly as you did before they were migrated. This is due to the fact that a symbolic link is created in place of the original file. As with files on /data, migrated files will be accessible from all Helix Systems, including the computational nodes of the Biowulf Cluster.
For example, before migration:
% ls -lAfter migration:
-rw-r----- 1 joeu joeu 5715508 Dec 13 1999 jobout.dat
% ls -lor, using the "-L" switch (which "follows" the link)
lrwxrwxrwx 1 root root 34 Feb 12 16:13 jobout.dat -> /dest2/data4/joeu/jobout.dat
% ls -lLOne consequence of using symbolic links as "place holders" for the original file is that if you wish to delete a file that has been migrated, you should first delete the migrated copy, and then the link that points to it. Continuing the example above:
-rw-r----- 1 joeu joeu 5715508 Dec 13 1999 jobout.dat
% rm /dest2/data4/joeu/jobout.datDifferences between primary and near-line storage:
% rm jobout.dat
$ checkquota
Mount Used Quota Percent Files /data7: 5.95 GB 48.00 GB 12.40% 5007 /home: 359.39 MB 500.00 MB 71.88% 3510
There are 3 categories of nodes in the Biowulf cluster:
There are many batch systems available; the one used on the Biowulf
cluster is Altair's PBS Pro.
The batch script (or batch command file) is simply a Linux shell
script with optional commands to PBS (called directives).
Using the Biowulf Cluster
The Biowulf Cluster is highly heterogeneous with respect to processor
speed, memory size and networks. And with the advent of multi-core
processor chips, nodes can contain either 2 or 4 processors.
Using the Batch System on Biowulf
Batch (or queueing) systems allow the user to submit jobs to the
cluster without regard for what resources (cpu, memory, networks) are
available. When those resources become available the batch system
will start the job.
Submitting Jobs
In order to run a job under batch you must submit a script which
contains the commands to be executed. The command to do this is qsub.
The qsub command on Biowulf is heavily customized and this
documentation should be used in lieu of that provided by Altair.
Sample Batch Script
#!/bin/bash # # this file is myjob.sh # #PBS -N MyJob #PBS -m be #PBS -k oe # cd $PBS_O_WORKDIR myprog < /data/me/mydata |
One could simply run this as a shell script on the command line, however when executed by the batch system, the lines beginning with #PBS have special meaning. These directives are as follows:
-N: name of the batch job.
-m: send mail when the job begins ("b") and when it ends ("e").
-k: keep the STDOUT ("o") and STDERR ("e") files.
Other directives can be found on the qsub man page.
The environment variable $PBS_O_WORKDIR is the directory which was the current directory when the qsub command was issued.
Note that if the "myprog" program above ordinarily sends its output (STDOUT) to the terminal, that output will appear in a file called MyJob.o<jobnumber>. Any errors (STDOUT) will appear as MyJob.e<jobnumber>.
qsub -l nodes=1 myjob.batThe resource list (-l) here consists of the number of nodes (1). You must specify a number of nodes, even if it is one.
The batch system allocation is done by nodes (not processors). When you are allocated a node by the batch system, your job is the only job running on the node. Since a node may contain 2 or 4 processors, you should ensure that you utilize the node fully (see the swarm program below).
Parallel jobs running on more than 2 cpus will need to specify more than 1 node. Additionally, since you will want a parallel job running on nodes of identical clock speed, you will probably want to specify a node type as well:
qsub -l nodes=8:o2800 myparalleljobNode types currently supported are (from fastest to slowest):
o2800 2.8 GHz Opteron o2200 2.2 GHz Opteron o2000 2.0 GHz Opteron p2800 2.8 GHz XeonNodes can also be selected based on memory size:
m2048 2 GB (1 GB/processor) m4096 4 GB (2 GB/processor) m8192 8 GB (4 GB/processor)(Note: All nodes have at least 1 GB. Also, the above memory properties are normalized for dual-processor nodes, ie, 4 GB 2p nodes and 8 GB 4p nodes both have m4096 properties).
Nodes can also be selected based on network:
gige gigabit ethernet ib infiniband myr2k myrinet(Note: only the p2800 nodes are not gige connected).
Other properties:
dc dual-core node (4p) x86-64 64-bit nodeNote: Don't make additional specifications unless you need to! In most cases, the less you specify the better. The more flexibility the scheduler has in allocating nodes for you job, the less likely your job will wind up having to wait for specific resources.
Additional Notes:
-r y|n this switch controls whether your job is restartable. If
one of the nodes that your job is running on should hang,
the operations staff will often restart the job. If you
do not want your job to be restarted, use "-r n" with qsub.
-v varlist this switch allows you to pass one or more variables from
the qsub command line to your batch script. For instance,
with "qsub -v np=4 myjob", the myjob script can use the $np
variable with a value of 4. You can also list an environment
variable without a value, and that envvar will be exported
with its current value.
-I for interactive jobs, see below.
-V export all environment variables to the batch script. Useful
when running X11 clients on interactive nodes, see below.
See the qsub man page for additional options.
The following example shows how to allocate 4 nodes:
biobos$ qsub -I -V -l nodes=4Note the following:
qsub: waiting for job 2011.biobos to start
qsub: job 2011.biobos ready
p139$ cat $PBS_NODEFILE
p139
p138
p137
p136
p139$ exit
logout
qsub: job 2011.biobos completed
biobos$
qstat -u <username> list jobs belonging to username qdel <jobid> delete job jobid qdel -Wforce <jobid> delete a job on a hung node qselect -u <user> -s <state> select jobs based on criteriaFor example, to delete all of your current jobs:
qdel `qselect -u myname`To delete only your queued jobs:
qdel `qselect -u myname -s Q`
#!/bin/bash |
Note how this script runs 2 instances of a program by putting them in the background (using the ampersand "&"), and then using the shell wait command to make the script wait for each background process before exiting.
When running many single-threaded jobs, setting up many batch command files can be cumbersome. The swarm command can be used to automatically generate batch command files and submit them to the batch system.
swarm allows you to submit an arbitrary number of serial jobs to the batch system by simply creating a command file with one command per line. swarm automatically creates batch command files and submits them to the batch system. Two commands are submitted for each node, making optimal use of the processors.
Here is an example command file:
myprog -param a < infile-a > outfile-a |
The command file is submitted using the following command:
swarm -f cmdfileThe result is 4 jobs submitted to the batch system, 3 jobs with 2 processes each, and the last with a single process.
When submitting a very large swarm (1000s of processes), the bundle option to swarm should be used:
swarm -f cmdfile -b 40
If cmdfile contains 2500 commands, approximately 63 bundles of 40 commands each will be created and submitted as 32 batch jobs (2 bundles per job, one for each processor on a node).
Note: swarm will correctly allocate the correct number of processes to 2- or 4-processor nodes.See the swarm documentation (http://biowulf.nih.gov/swarm.html) for more details.
The basic procedure is as follows (generation of these scripts can be done automatically by writing a higher order script):
1. Create an executable shell script which will run multiple instances of your program. Which will run depends on the "mpi task id" of the instance.
#!/bin/tcsh |
2. Use mpirun in your batch command file to run the mpi shell program (multirun):
#!/bin/tcsh |
3. Submit the job to the batch system:
qsub -l nodes=3 myjob.sh
This job will run 6 instances of the program on 3 nodes.
Monitoring Your Jobs
Once a batch job has been submitted it can be monitored using
both
command line and web-based tools.
Using either the jobload command or the user job monitor (see below), you can determine the overall behavior of your job based on the load of each node. Perfectly behaving jobs will have loads of near 2.0 on all nodes. If the nodes in a parallel job are running at loads below 1.5 (or are green), the job probably isn't scaling well, and you should rerun the job with fewer nodes. Loads of less than 1.5 for non-parallel jobs may mean that only one processor is being used, or that the job is i/o bound. Contact the Biowulf staff for help in deciding whether your job is making best use of its resources.
The sections below provide information about various ways of monitoring your jobs.
% freen
m1024 m2048 m4096 Total
------------- GeneralPool -------------
o2800 / 17/91 1/241 18/332
o2200 / 84/223 27/44 111/267
o2000 / 6/40 / 6/40
p2800 31/80 125/204 63/63 219/347
p1800 31/33 4/4 / 35/37
------------- Myrinet -------------
o2200 / 11/67 / 11/67
o2000 16/48 / / 16/48
p2800 22/38 / / 22/38
------------- Infiniband -------------
o2800 / / 4/60 4/60
------------- Reserved -------------
o2800 / / 14/15 14/15
o2600 / / 0/39 0/39
o2200 / / 4/14 4/14
$ jobload juser Jobs for juser Node Load 455157.biobos p488 100% 455260.biobos p331 100% 455261.biobos p405 100% 455262.biobos p451 99% 455263.biobos p452 100% 455265.biobos p1425 100% 561609.biobos p506 99% 699962.biobos p416 50% 812953.biobos p744 50% User Average: 88%All but 2 nodes allocated to this user are running to capacity.
$ jobload 869186
869186.biobos Node Load
p554 100%
p555 100%
p557 100%
p564 100%
p565 100%
p566 101%
p567 100%
p568 100%
Job Average: 100%
This shows a well-behaved parallel job.
# jobload 763334
763334.biobos Node Load
p1495 25%
p1497 0%
p1498 0%
p1500 0%
p1501 0%
p1503 0%
p1504 0%
p1507 0%
Job Average: 3%
This last example shows a improperly running
parallel job.

The color of the dot indicates the load:
Clicking on the dot corresponding to a node will result in a display of the process, cpu, disk and memory status for that node (output from the top and df commands).The status of specific batch jobs can be monitored by first clicking on List Status of Batch Jobs which gives output similar to qstat, and then clicking on the Job ID of the job of interest. This results in the display of a matrix which contains dots only for the nodes allocated to the job. The loads of those nodes can be monitored in same way as for the system as a whole.
Finally, the sum total of nodes allocated across all jobs to a user can be monitored by clicking on Username.
Again, for both JobID and Username monitoring, clicking on a dot corresponding to a node results in a display of information about the node.
Parallel Jobs
Due to the distributed memory architecture of clusters,
parallel
programming on the system is generally done by explicit message
passing. Most programs are written in C or Fortran using the
MPI message
passing library (see below).
#!/bin/bash |
The batch job is submitted with the qsub command:
qsub -v np=16 -l nodes=8 mpi-hello.sh
In this example a batch file with the name mpi-hello.sh is submitted to the batch queue with a job name of "MyJob". Note that many switches to qsub can be specified either on the command line or as part of the script by use of the special comment string (also called a directive) #PBS. The job runs a 16-process mpi program named mpi-hello. The "-m" switch specifies that mail should be sent to the user when the job begins and when it ends. The "-k" switch specifies which output files should be "kept", the argument of "oe" specifies that both stdout and stderr files are written.
Other important features of the example above:
Here is an example of a complete batch command file for running a job over Myrinet/GM:
#!/bin/csh |
The batch job is submitted with the qsub command:
qsub -v np=8 -l nodes=4:myr2k myjob.bat
Note the node specifier "4:myr2k", which results in a request for 4 nodes with the "myr2k" property (i.e., connected to Myrinet 2000).
% cd ~/.ssh % cp /usr/local/etc/ssh_config_ib config % chmod 600 config(If you already have a ssh config file, you should append the contents of /usr/local/etc/ssh_config_ib to it).
Use the "ib" property when submitting to the infiniband nodes, for example:
qsub -v np=16 -l nodes=8:ib myibjob.bat
You can determine the total number of processors in each pool by using the 'batchlim' command:
$ batchlim
(only the relevant lines are shown here)
Max CPUs Max CPUs
Per User Available
---------- -----------
ib2 48 74
ib 128 196
The "ib2" pool is for small jobs, the "ib" pool
for large jobs. The batchlim output shows that
there are 196 and 74 processors respectively
assigned to the ib and ib2 pools. The maximum
per-user limits are 128 and 48 processors
respectively.
Programming Tools & Libraries
All nodes run the Redhat Linux operating system, including
the GNU
EGCS C compiler (gcc) and GNU Fortran (g77).
Two potentially higher performance compiler suites are also available:
Portland Group compilers include: C (pgcc), C++ (pgCC), Fortran 77 (pgf77), and Fortran 90 (pgf90). This compiler suite also includes the PGPROF profiler (pgprof) and PGDBG debugger (pgdbg), each with an X window interface. Documentation for the Portland Group compilers is available on-line:
http://biowulf.nih.gov/doc/pgdoc.
Intel compiler software includes a C/C++ compiler (icc), a Fortran compiler (ifort) and an application debugger (idb). Documentation for Intel's compilers and debugger is on-line:
C/C++:
http://biowulf.nih.gov/doc/intel/cc10/Doc_Index.htm
Fortran:
http://biowulf.nih.gov/doc/intel/fc10/Doc_Index.htm
Application Debugger:
http://biowulf.nih.gov/doc/intel/idb10/Doc_Index.htm
Pathscale Compilers are available for users that want optimized binaries on our 64-bit Opteron nodes. There is a debugger available as well as an option generator that will analyze source code and suggest compiler options. Documentation for the Pathscale Compilers can be found here (PDF Format):
User Guide
http://biowulf.nih.gov/doc/pathscale/UserGuide.pdf
Debugger
http://biowulf.nih.gov/doc/pathscale/PathDB_UserGuide.pdf
Optimization Generator
http://helixweb.cit.nih.gov/doc/pathscale/pathOGen.txt
Compiling with the Intel, Portland Group or Pathscale compilers requires that the user set up their shell environment by sourcing the appropriate script. The following table shows the various compiler versions available on Biowulf, environment set-up scripts (one for C-Shell users and one for Bash users) and the location/availability of an MPICH and/or MPICH-GM installation for each compiler.
GCC 4.1.2 is available for those users who need support for the newer compilers (gfortran in particular), require gcj (for compiling Java code into native binaries) or need any of the additional features found in the GCC 4 suite.
| Compilers and set-up scripts available on Biowulf: | ||||
| C/C++ | Front-end | Environment Setup | MPICH Home | MPICH-GM Home |
| GCC 3.2.3 | gcc | Default | /usr/local/mpich | /usr/local/mpich-gm2k |
| GCC 3.2.3 x86_64 | gcc | Default on x86_64 nodes | Available on request | n/a |
| PGI 7.1 | pgcc (C) pgCC (C++) |
% source /usr/local/pgi/pgivars.sh % source /usr/local/pgi/pgivars.csh |
/usr/local/mpich-pg | /usr/local/mpich-gm2k-pg |
| PGI 7.1 x86_64 | pgcc (C) pgCC (C++) |
% source /usr/local/pgi/pgivars.sh % source /usr/local/pgi/pgivars.csh |
/usr/local/mpich-pg64 | n/a |
| Intel v10.1 | icc |
% source /usr/local/intel/intelvars.sh % source /usr/local/intel/intelvars.csh |
/usr/local/mpich-i (C only) |
/usr/local/mpich-gm2k-i (C only) |
| Intel v10.1 x86_64 | icc |
% source /usr/local/intel/intelvars.sh % source /usr/local/intel/intelvars.csh |
/usr/local/mpich-i64 (C only) | n/a |
|
Pathscale 2.5 (32- & 64-bit) |
pathcc (C) pathCC (C++) |
% source /usr/local/pathscale/pathvars.sh % source /usr/local/pathscale/pathvars.csh |
/usr/local/mpich-ps /usr/local/mpich-ps64 |
/usr/local/mpich-gm2k-ps |
| GCC 4 |
gcc (C) g++ (C++) |
% source /usr/local/gcc/gccvars.sh % source /usr/local/gcc/gccvars.csh |
/usr/local/mpich-gcc4 |
/usr/local/mpich-gm2k-gcc4 |
| GCC 4 x86_64 |
gcc (C) g++ (C++) |
% source /usr/local/gcc/gccvars.sh % source /usr/local/gcc/gccvars.csh |
/usr/local/mpich-gcc4_64 |
n/a |
| Fortran77/90 | Front-end | Environment Setup | MPICH Home | MPICH-GM Home |
| GCC 3.2.3 | g77 | Default | /usr/local/mpich | /usr/local/mpich-gm2k |
| GCC 3.2.3 x86_64 | g77 | Default on x86_64 nodes | Available on request | n/a |
| PGI 7.1 |
pgf77 pgf90 |
% source /usr/local/pgi/pgivars.sh % source /usr/local/pgi/pgivars.csh |
/usr/local/mpich-pg | /usr/local/mpich-gm2k-pg |
| PGI 7.1 x86_64 |
pgf77 pgf90 |
% source /usr/local/pgi/pgivars.sh % source /usr/local/pgi/pgivars.csh |
/usr/local/mpich-pg64 | n/a |
| Intel v10.0 | ifort |
% source /usr/local/intel/intelvars.sh % source /usr/local/intel/intelvars.csh |
/usr/local/mpich-i | /usr/local/mpich-gm2k-i |
| Intel v10.0 x86_64 | ifort |
% source /usr/local/intel/intelvars.sh % source /usr/local/intel/intelvars.csh |
/usr/local/mpich-i64 | n/a |
|
PathScale 2.5 (32- & 64-bit) |
pathf90 |
% source /usr/local/pathscale/pathvars.sh % source /usr/local/pathscale/pathvars.csh |
/usr/local/mpich-ps /usr/local/mpich-ps64 |
/usr/local/mpich-gm2k-ps |
| GCC 4 (Fortran 77,90 & 95) |
gfortran |
% source /usr/local/gcc/gccvars.sh % source /usr/local/gcc/gccvars.csh |
/usr/local/mpich-gcc4 |
/usr/local/mpich-gm2k-gcc4 |
| GCC 4 x86_64 (Fortran 77,90 & 95) |
gfortran |
% source /usr/local/gcc/gccvars.sh % source /usr/local/gcc/gccvars.csh |
/usr/local/mpich-gcc4_64 |
n/a |
There are 2 MPI library implementations available on Biowulf: MPICH and MPICH-GM. MPICH is a popular implementation of the MPI library. Interprocess communicaction using MPICH occurs using TCP/IP sockets, and so allows communication over virtually any network technology. However, communicating over TCP/IP has significant overhead resulting in realtively high latencies which can limit the scalability and performance of codes with demanding communication requirements.
MPICH-GM is a version of MPICH written to make calls directly to the Myrinet GM protocol layer, thereby bypassing the TCP/IP protocol stack. This should result in significantly improved latency and bandwidth. Check the hardware table above for the nodes with connections to Myrinet.
A User Guide for MPICH and man pages are available on-line. The user guide and guide to MPE (MPICH MPI Extensions) are also available in the form of postscript files:
User's Guide for mpich, v 1.2.0 /usr/local/doc/mpich-1.2.0-usersguide.ps.gz
User's Guide for MPE, v 1.2.0 /usr/local/doc/mpich-1.2.0-mpeguide.ps.gz
After deciding which MPI library you wish to use, it is very important to make sure your PATH is properly set (see below).
The instructions provided for running MPI in this section are for running MPI jobs interactively. Instructions for running jobs in batch are given in the next section below.
The table below indicates the current software versions available on the Biowulf cluster.
| Biowulf | computational nodes (Xeon) |
computational nodes (Opteron) |
computational nodes (Opteron) |
computational nodes (Opteron/Myrinet) |
computational nodes (Opteron/IB) |
|
| Redhat | RHEL 3.9 WS 32-bit |
RHEL3-WS 32-bit |
RHEL3-WS 64-bit (*) |
CentOS 5.0 (**) 64-bit (*) |
RHEL3-WS 32-bit |
CentOS 4.2 (**) 64-bit (*) |
| Linux kernel | 2.4.21-37.EL | 2.6.10 | 2.6.9 | 2.6.18-8.el5 | 2.6.10 | 2.6.9-22.EL |
| C library | glibc-2.3.2 | glibc-2.3.2 | glibc-2.3.2 | glibc-2.5 | glibc-2.3.2 | glibc-2.3.4 |
| GNU gcc & g77 | 3.2.3 | 3.2.3 | 3.2.3 | 4.1.1 | 3.2.3 | 3.4.4 |
| Portland Group C, C++ and Fortran |
6.1 | 6.1 (64-bit) | 6.1 (64-bit) | |||
| Intel C, C++ and Fortran |
10.1 | 10.1 (64-bit) | 10.1 (64-bit) | |||
| PathScale C, C++ and Fortran |
2.5 | 2.5 (64-bit) | 2.5 (64-bit) | |||
| PathScale InfiniPath |
3.1 | |||||
| FFTW | 3.1 | |||||
| MPICH | 1.2.7 | MPICH-GM | 1.2.6..14a | |||
| GM | 2.0.6 | |||||
| PBS Pro | 8.0 | |||||
| MySQL | 5.0.16 | |||||
| Sun Java 2 SDK | 1.6.0 | |||||
| wxPython | 2.5.2 | |||||
|
ACML Includes BLAS and LAPACK routines. |
3.0.0 | |||||
Compiling MPI Programs
Biowulf has a number of compilers and mpi libraries to choose
from. The table below summarizes which compilers are available for
each MPI library, and the libraries under which they were built. If
you require a particular combination not listed, contact the Biowulf
systems staff.
| MPICH | MPICH-GM | InfiniPath | |
| GNU gcc | x | x | |
| GNU g77 | x | x | |
| Portland Group pgcc | x | x | |
| Portland Group pgCC | x | x | |
| Portland Group pgf77 | x | x | |
| Portland Group pgf90 | x | x | |
| Intel icc | x | ||
| Intel ifort | x | ||
| PathScale pathcc | x | x | |
| PathScale pathCC | x | x | |
| PathScale pathf90 | x | x |
There are 2 steps in compiling your code:
| MPICH | MPICH-GM | InfiniPath | |
| GNU Compilers |
/usr/local/mpich/bin or /usr/local/mpich-gnu/bin |
/usr/local/mpich-gm2k/bin or /usr/local/mpich-gm2k-gnu/bin |
- |
| Portland Group Compilers |
/usr/local/mpich-pg/bin /usr/local/mpich-pg64/bin |
/usr/local/mpich-gm2k-pg/bin | - |
| Intel Compilers |
/usr/local/mpich-i/bin | - | - |
| PathScale Compilers |
/usr/local/mpich-ps/bin /usr/local/mpich-ps64/bin |
- | no special PATH (/usr/bin) |
| c wrapper | c++ wrapper | Fortran 77 wrapper |
Fortran 90 wrapper |
|
| GNU compilers | mpigcc | mpiCC | mpif77 | |
| Portland Group compilers |
mpicc | mpiCC | mpif77 | mpif90 |
| Intel compilers | mpicc | mpiCC | mpif77 | mpif90 |
| PathScale compilers | mpicc | mpiCC | mpif90 |
The simplest syntax for compiling using the mpich wrappers is:
mpicc -o prog proc.c
biowulf$ export PATH=/usr/local/mpich-pg/bin:$PATH
biowulf$ mpif77 -o MyPi pi3.f
Linking:
biowulf$ ls -l MyPi
-rwxr-xr-x 1 steve wheel 414340 Feb 16 15:11 MyPi
biowulf$
% qsub -l nodes=1:ib -I(note: if all ib nodes are allocated to jobs, you'll have to wait until one becomes free. Please do not login directly to ib nodes.
You do not need to set any special PATH to access the mpi wrapper scripts. Be sure to "exit" from the interactive job when finished to return the node to the available pool of nodes.
Note: because the software environment is different from that on any other nodes, executables built on ib nodes will probably not run on any other nodes.
1. Allocate the appropriate number of interactive nodes:
qsub -I -V -l nodes=#The batch system will allocate '#' nodes, and will log you onto the 'master' node.
2. Startup your program with
mpirun -machinefile $PBS_NODEFILE -np 8 myprog3. Exit from the nodes to complete the batch job. (IMPORTANT!)
exitnote: make sure your PATH is set correctly! The path selects mpirun in the same way it selects the mpi compile command. $PBS_NODEFILE is an environment parameter that is automatically set by the batch system. This file contains the list of nodes allocated to your job. (you can 'cat $PBS_NODEFILE' to check).
Benchmarking
Biowulf nodes are heterogeneous with respect to architecture
(x86_64,
i686), processor speed (between 1.8 Ghz and 2.8 GHz),
memory size (between 1 and 4 GB), and networks (1 Gb/s, 100 Mb/s). The batch system will not
distinguish amongst nodes with differing cpu clock speeds, memory
sizes or networks. This will have no consequences for most production codes
(other than varying runtimes), however, if you are running benchmarks,
you will want to specify processor speed and memory size:
qsub -l nodes=4:p2800:m2048 myjob.bat run on 2.8 GHz Xeon/2 GB nodes
qsub -l nodes=4:o2800:m4096 myjob.bat run on 2.8 GHz Opteron/4 GB nodes