biowulf_logo

Status
About
Hardware
Applications
Batch queues
Disk storage

MPI
Performance
New Users
User Guide
Documentation
Research
Photos


NAMD Benchmarks

Benchmark 1: Apoa1 benchmark from the NAMD suite. 500 steps, 92K atoms, 12A cutoff + PME every 4 steps. (Aug 2006). apoa1 apoa1_log

All parallel jobs on the Biowulf cluster should run at least 70% efficiency, to ensure maximum utilization of the cluster resources. Based on this set of benchmarks, the apoa1 and similar jobs should be submitted to about 8 p2800, o2200 or o2800 nodes (16 processors), or up to 16 Infiniband nodes (32 processors). Other types of jobs may scale differently; see the Biowulf NAMD page for examples.

To find the most appropriate number of nodes for a specific type of job, it is essential to run one's own benchmarks.

# processors Wallclock time in seconds (Efficiency )
p2800 gige
2.8 GHz Xeon
Gigabit Ethernet
Intel compiler
o2200 gige
2.2 GHz Opteron
Gigabit Ethernet
Intel compiler
o2800 gige
2.8 GHz Opteron
Gigabit Ethernet
Intel compiler
o2800 ib
2.8 GHz Opteron
Infiniband
Pathscale compiler
1 1970 (100) 1631 (100) 1163 (100) 1125 (100)
2 1047 (94) 844 (97) 612 (95) 575 (98)
4 547 (90) 447 (91) 322 (90) 298 (94)
6 378 (87) 313 (87) 234 (83) 199 (94)
8 300 (82) 249 (82) 177 (82) 150 (94)
10 253 (78) 211 (77) 189 (61) 130 (87)
12 204 (80) 169 (80) 129 (75) 102 (92)
14 193 (73) 158 (74) 129 (64) 95 (85)
16 178 (69) 145 (70) 120 (61) 87 (81)
18 140 (78) 116 (78) 87 (74) 71 (88)
20 134 (74) 119 (69) 84 (69) 67 (83)
24 118 (70) 106 (64) 79 (61) 54 (88)
28 103 (69) 86 (68) 65 (64) 50 (81)
32 98 (63) 87 (58) 62 (62) 47 (75)


Benchmark 2: Water Sphere simulation, courtesy Jeff Forbes, NIAMS. (August 2006) watersphere1
watersphere2

Based on these benchmarks, to obtain at least 70% efficiency, this job could be run on about 24 processors (12 nodes) on p2800, o2200, or o2800 nodes. The efficiency drops much more slowly on the Infiniband nodes, so the job could use up to 40 or 50 processors (20-25 nodes) on the Infiniband nodes.

# processors Wallclock time in seconds (Efficiency)
p2800 gige
2.8 GHz Xeon
Gigabit Ethernet
prebuilt 32-bit binaries
o2200 gige
2.2 GHz Opteron
Gigabit Ethernet
prebuilt 64-bit binaries
o2800 gige
2.8 GHz Opteron
Gigabit Ethernet
prebuilt 64-bit binaries
o2800 ib
2.8 GHz Opteron
Infiniband
Pathscale compilers
1 7011 (100)5207 (100)3355 (100)3117 (100)
2 3590 (98)2659 (98)1754 (6)1593 (98)
4 1838 (95)1377 (95)924 (91)816 (96)
6 1342 (87)1045 (83)649 (86)589 (88)
8 991 (88)775 (84)490 (86)424 (92)
10799 (88)613 (85)402 (83)343 (91)
12713 (82)531 (82)336 83)282 (92)
14578 (87)456 (82)292 (82)243 (92)
16525 (83)399 (82)264 (79)214 (91)
20433 (81)343 (76)218 (77)181 (86)
24375 (78)290 (75)186 (75)151 (86)
28321 (78)273 (68)167 (72)129 (86)
32295 (74)265 (61)147 (71)116 (84)
36255 (76)248 (58)139 (67)103 (84)
40243 (72)238 (55)129 (65)93 (84)


This document is available as http://biowulf.nih.gov/apps/namd/namd_bench.html
Biowulf home page | Helix Systems | NIH