Breaking Down the Parallel Performance of GROMACS, a High-Performance Molecular Dynamics Software
M{\aa}ns I. Andersson, N. Arul Murugan, Artur Podobas, Stefano, Markidis

TL;DR
This paper analyzes the parallel performance of GROMACS, a widely used molecular dynamics software, by examining computational phases, configurations, and FFT libraries to identify bottlenecks and opportunities for optimization.
Contribution
It provides a detailed breakdown of GROMACS's computational phases and identifies non-scalable stages, offering insights for performance improvements, especially in FFT calculations.
Findings
MPI communication during 3D FFT limits scalability
Particle-Mesh Ewald and 3D FFT are performance bottlenecks
Performance varies with different FFT libraries
Abstract
GROMACS is one of the most widely used HPC software packages using the Molecular Dynamics (MD) simulation technique. In this work, we quantify GROMACS parallel performance using different configurations, HPC systems, and FFT libraries (FFTW, Intel MKL FFT, and FFT PACK). We break down the cost of each GROMACS computational phase and identify non-scalable stages, such as MPI communication during the 3D FFT computation when using a large number of processes. We show that the Particle-Mesh Ewald phase and the 3D FFT calculation significantly impact the GROMACS performance. Finally, we discuss performance opportunities with a particular interest in developing GROMACS for the FFT calculations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Storage Technologies · Parallel Computing and Optimization Techniques · Distributed and Parallel Computing Systems
