Gromacs benchmarks

GROMACS v 3.3.1
July 2006

Benchmark 1: 1024 DPPC lipids with 23 water molecules per lipid, totalling to 121856 atoms. A twin-range group based cut-off is used, 1.8 nm for electrostatics and 1.0 nm for Lennard-Jones interactions. The long-range contribution to electrostatics is updated every 10 steps. 5000 steps = 10ps.
As others have observed, this is a particularly scalable benchmark.

dppc_3

# processors Wallclock time in seconds (Efficiency %)
p2800,gige
(2.8 GHz Xeon
1 Gb/s ethernet )
p2800,myr2k
(2.8 GHz Xeon
2 Gb/s Myrinet)
o2200,gige
(2.2 GHz Opteron
1Gb/s ethernet)
o2200,myr2k
(2.2 GHz Opteron
2 Gb/s Myrinet)
o2800,gige
(2.8 GHz Opteron
1 Gb/s ethernet)
o2800,ipath
(2.8 GHz Opteron
10 Gb/s Infinipath)
1 5938 (100)6009 (100)4200 (100)4275 (100)3343 (100)2683 (100)
2 2254 (132)2271 (132)1979 (106) 1905 (112)1520 (110)1157 (116)
4 1700 (87)1168 (129)1247 (84) 1029 (104)1044 (80)596 (112)
6 839 (119) 742 (96)924 (60)413 (108)
8 651 (115) 563 (95) 309 (108)
10 550 (109) 487 (88) 263 (102)
12 483 (104) 434 (82) 226 (99)
14 442 (97) 394 (78) 200 (96)
16 410 (92) 355 (75) 178 (94)
18 407 (82) 346 (69) 166 (90)
20 376 (80) 335 (64) 158 (85)
24 371 (67) 337 (53) 155 (72)
28 361 (59) 337 (45) 141 (68)
32 363 (52) 337 (40) 139 (60)

The o2800 ipath (2.8 GHz Opterons, 10 Gb/s Infinipath) runs were performed with a 64-bit build of Gromacs with Pathscale compilers. All other runs are with 32-bit builds of Gromacs with gcc and mpich.
                100 * Time on 1 processor
Efficiency =    ---------------------------    
                  n * Time on n processors
Note the consistent superscaling, where a x-processor job runs more than x times as fast as a 1-processor job on the same type of node, and therefore the efficiency is > 100%. This has been observed by other groups. "GROMACS has a communications intensive benchmark that can experience superlinear performance. When partitioned across multiple nodes, a larger portion of the simulation data can reside in L2 cache, reducing the amount of main memory accesses.".

Bottom line:

The same benchmark, reported in terms of ns/day and speedup.

# processors ns/day (Speedup)
p2800,gige
(2.8 GHz Xeon
1 Gb/s ethernet )
p2800,myr2k
(2.8 GHz Xeon
2 Gb/s Myrinet)
o2200,gige
(2.2 GHz Opteron
1Gb/s ethernet)
o2200,myr2k
(2.2 GHz Opteron
2 Gb/s Myrinet)
o2800,gige
(2.8 GHz Opteron
1 Gb/s ethernet)
o2800,ipath
(2.8 GHz Opteron
10 Gb/s Infinipath)
10.150.140.210.200.260.32
20.38 (2.63)0.38 (2.64)0.437 (2.12)0.45 (2.27)0.57 (2.20)0.75 (2.32)
40.51 (3.48)0.74 (5.14)0.693 (3.36)0.84 (4.20)0.83 (3.21)1.45 (4.50)
61.03 (7.15)1.16 (5.82)0.94 (3.62)2.09 (6.50)
81.33 (9.22)1.54 (7.68)2.80 (8.68)
101.57 (10.91)1.77 (8.87)3.29 (10.20)
121.79 (12.42)1.99 (9.96)3.82 (11.87)
141.96 (13.58)2.19 (10.97)4.32 (13.42)
162.11 (14.63)2.43 (12.17)4.85 (15.07)
182.12 (14.74)2.50 (12.49)5.21 (16.16)
202.30 (15.96)2.58 (12.90)5.47 (16.98)
242.33 (16.17)2.56 (12.82)5.57 (17.31)
282.39 (16.62)2.56 (12.82)6.13 (19.03)
322.38 (16.53)2.56 (12.82)6.22 (19.30)


Benchmark 2: courtesy of Yinon Shafrir (NCI).

dharlor
# processors Wallclock time (Efficiency)
p2800,myr2k
(2.8 GHz Xeon
2 Gb/s Myrinet)
o2200,myr2k
(2.2 GHz Opteron
2 Gb/s Myrinet)
o2800,ipath
(2.8 GHz Opteron
10 Gb/s Infinipath)
1 2012 (100)1927 (100)1143 (100)
2 1038 (97)969 (99)596 (96)
4 566 (89)538 (90)321 (89)
6 37 (89_8361 (89)211 (90)
8 310 (81)295 (82)167 (86)
12231 (73)227 (71)132 (72)
16214 (59)189 (64)105 (68)
20165 (61)161 (60)94 (61)
24 161 (52)153 (52)85 (56)
28172 (42)145 (47)84 (49)

Summary: For this job, there is no significant difference between running on the o2200 Myrinet nodes and the p2800 Myrinet nodes. The o2800 Infinipath nodes are the fastest and so will give the shortest run time. The efficiency is similar on all 3 types of nodes, and the job can be run on up to 12 processors (6 nodes) with efficiency > 70%.

As with other benchmarks, Gromacs scales very poorly on nodes without Myrinet or Infinipath, and so should be run on no more than 1 node (2 processors) on those nodes. (Results not reported in the graph and table above).


Updated 14 July 2006