Gromacs Benchmarks
GROMACS v 4.0.3
March 2009
Benchmark 1:
1024 DPPC lipids with 23 water molecules per lipid, totalling to
121856 atoms. A twin-range group based cut-off is used, 1.8 nm for electrostatics and 1.0 nm for Lennard-Jones interactions. The long-range
contribution to electrostatics is updated every 10 steps. 5000 steps = 10ps.
As others have observed, this is a particularly scalable benchmark.
| # processors | Wallclock time in seconds (Efficiency %)
| o2800,gige | (2.8 GHz Opteron 1 Gb/s ethernet ) e2800, ib | (2.8 GHz Intel EMT64 16 Gb/s Infiniband) o2800, ipath | (2.8 GHz Opteron 8 Gb/s Infinipath) o2200, myr2k | (2.2 GHz Opteron 2 Gb/s Myrinet) 1 | 2656 (100) | 1734 (100) | 2629 (100) | 4098 (100)
| 2 | 1039 (128) | 754 (115) | 981 (134) | 1504 (136)
| 4 | 508 (131) | 382 (114) | 491 (134) | 770 (133)
| 8 | 294 (113) | 203 (107) | 254 (129) | 399 (128)
| 16 | 200 (83) | 102 (106) | 128 (128) | 206 (124)
| 32 | 147(56) | 53 (102) | 67 (123) | 108 (118)
| 64 | 28 (97) | 35 (117) | 128 | | 17 (80) | 20 (103) | 256 | 12 (56) | | |||||||||||
100 * Time on 1 processor
Efficiency = ---------------------------
n * Time on n processors
Note the consistent superscaling, where a 2-processor job runs more than x times as fast as
a 1-processor job on the same type of node, and therefore the efficiency is > 100%. This
has been observed by other groups.
"GROMACS has a communications intensive benchmark that can experience superlinear performance. When partitioned across multiple nodes, a larger portion of the simulation data can reside in L2 cache, reducing the amount of main memory accesses.".
Bottom line:
- The job scales best and is by far the fastest on the Infiniband (IB) nodes. Based on these results, it would be reasonable to run this job on 128 IB processors (16 IB nodes) or 128 Ipath processors (64 Ipath nodes).
- On gige, this job should be run on no more than 16 processors (8 single-core nodes or 4 dual-core nodes).
- The job scales fine on the Myrinet nodes, but these nodes are so much slower that they are only worth using if no IB or Ipath nodes are available.
The same benchmark, reported in terms of ns/day and speedup.
| # processors | ns/day (Speedup)
| o2800,gige | (2.8 GHz Opteron 1 Gb/s ethernet ) e2800, ib | (2.8 GHz Intel EMT64 16 Gb/s Infiniband) o2800, ipath | (2.8 GHz Opteron 8 Gb/s Infinipath) 1 | 0.325 | 0.498 | 0.329
| 2 | 0.832 (2.6) | 1.146 (2.3) | 0.881 (2.7)
| 4 | 1.701 (5.2) | 2.262 (4.5) | 1.760 (5.3)
| 8 | 2.939 (9.0) | 4.257 (8.5) | 3.402 (10.3)
| 16 | 5.879 (18.1) | 8.472 (17.0) | 6.751 (20.5)
| 32 | 9.933 (30.6) | 16.305 (32.7) | 12.898 (39.2)
| 64 | 30.863 (62) | 24.691 (75.0)
| 128 | 50.834 (102) | 43.209 (131.3)
| 256 | 72.014 (144) | | ||||||
Benchmarks for Gromacs v 3.3.3 (May 2008)
Benchmarks for Gromacs v 3.3.1 (July 2006)


