NAMD Benchmarks
Benchmark 1: Apoa1 benchmark from the NAMD suite. 500 steps, 92K atoms, 12A cutoff + PME every 4 steps. (Aug 2006).
All parallel jobs on the Biowulf cluster should run at least 70% efficiency, to ensure maximum
utilization of the cluster resources. Based on this set of benchmarks, the apoa1 and similar jobs
should be submitted to about 8 p2800, o2200 or o2800 nodes (16 processors), or up to 16 Infiniband
nodes (32 processors). Other types of jobs may scale differently; see the
Biowulf NAMD page for examples.
To find the most appropriate number of nodes for a specific type of job, it is essential to run one's own benchmarks.
|
# processors |
Wallclock time in seconds (Efficiency )
|
p2800 gige
2.8 GHz Xeon
Gigabit Ethernet
Intel compiler |
o2200 gige
2.2 GHz Opteron
Gigabit Ethernet
Intel compiler |
o2800 gige
2.8 GHz Opteron
Gigabit Ethernet
Intel compiler |
o2800 ib
2.8 GHz Opteron
Infiniband
Pathscale compiler |
| 1 |
1970 (100)
| 1631 (100)
| 1163 (100)
| 1125 (100)
| | 2
| 1047 (94)
| 844 (97)
| 612 (95)
| 575 (98)
|
| 4
| 547 (90)
| 447 (91)
| 322 (90)
| 298 (94)
|
| 6
| 378 (87)
| 313 (87)
| 234 (83)
| 199 (94)
|
| 8
| 300 (82)
| 249 (82)
| 177 (82)
| 150 (94)
|
| 10
| 253 (78)
| 211 (77)
| 189 (61)
| 130 (87)
|
| 12
| 204 (80)
| 169 (80)
| 129 (75)
| 102 (92)
|
| 14
| 193 (73)
| 158 (74)
| 129 (64)
| 95 (85)
|
| 16
| 178 (69)
| 145 (70)
| 120 (61)
| 87 (81)
|
| 18
| 140 (78)
| 116 (78)
| 87 (74)
| 71 (88)
| | 20
| 134 (74)
| 119 (69)
| 84 (69)
| 67 (83)
| | 24
| 118 (70)
| 106 (64)
| 79 (61)
| 54 (88)
| | 28
| 103 (69)
| 86 (68)
| 65 (64)
| 50 (81)
| | 32
| 98 (63)
| 87 (58)
| 62 (62)
| 47 (75)
|
Benchmark 2: Water Sphere simulation, courtesy Jeff Forbes, NIAMS. (August 2006)


Based on these benchmarks, to obtain at least 70% efficiency, this job could be run on about 24 processors (12 nodes) on p2800, o2200, or o2800 nodes. The efficiency drops much more slowly on the Infiniband nodes, so the job could use up to 40 or 50 processors (20-25 nodes) on the Infiniband nodes.
| # processors
| Wallclock time in seconds (Efficiency)
| p2800 gige 2.8 GHz Xeon Gigabit Ethernet prebuilt 32-bit binaries
| o2200 gige 2.2 GHz Opteron Gigabit Ethernet prebuilt 64-bit binaries
| o2800 gige 2.8 GHz Opteron Gigabit Ethernet prebuilt 64-bit binaries
| o2800 ib 2.8 GHz Opteron Infiniband Pathscale compilers
| 1
| 7011 (100) | 5207 (100) | 3355 (100) | 3117 (100)
| | 2
| 3590 (98) | 2659 (98) | 1754 (6) | 1593 (98)
| | 4
| 1838 (95) | 1377 (95) | 924 (91) | 816 (96)
| | 6 | 1342 (87) | 1045 (83) | 649 (86) | 589 (88)
| | 8 | 991 (88) | 775 (84) | 490 (86) | 424 (92)
| | 10 | 799 (88) | 613 (85) | 402 (83) | 343 (91)
| | 12 | 713 (82) | 531 (82) | 336 83) | 282 (92)
| | 14 | 578 (87) | 456 (82) | 292 (82) | 243 (92)
|
| 16 | 525 (83) | 399 (82) | 264 (79) | 214 (91)
| | 20 | 433 (81) | 343 (76) | 218 (77) | 181 (86)
| | 24 | 375 (78) | 290 (75) | 186 (75) | 151 (86)
| | 28 | 321 (78) | 273 (68) | 167 (72) | 129 (86)
| | 32 | 295 (74) | 265 (61) | 147 (71) | 116 (84)
| | 36 | 255 (76) | 248 (58) | 139 (67) | 103 (84)
| | 40 | 243 (72) | 238 (55) | 129 (65) | 93 (84)
| |
|