Achieving Efficient Strong Scaling with PETSc using Hybrid MPI/OpenMP Optimisation
Michael Lange, Gerard Gorman, Michele Weiland, Lawrence Mitchell and, James Southern

TL;DR
This paper demonstrates how hybrid MPI/OpenMP parallelisation improves the strong scaling and performance of sparse matrix-vector multiplication in PETSc on modern supercomputers, enabling efficient large-scale computations.
Contribution
It introduces hybrid MPI/OpenMP optimisation techniques for PETSc, including communication overlap and thread load balancing, to enhance scalability and performance.
Findings
Significant speedup over pure-MPI mode.
Efficient strong scaling on Fujitsu PRIMEHPC FX10 and Cray XE6.
Effective use of task-based parallelism for communication overlap.
Abstract
The increasing number of processing elements and decreas- ing memory to core ratio in modern high-performance platforms makes efficient strong scaling a key requirement for numerical algorithms. In order to achieve efficient scalability on massively parallel systems scientific software must evolve across the entire stack to exploit the multiple levels of parallelism exposed in modern architectures. In this paper we demonstrate the use of hybrid MPI/OpenMP parallelisation to optimise parallel sparse matrix-vector multiplication in PETSc, a widely used scientific library for the scalable solution of partial differential equations. Using large matrices generated by Fluidity, an open source CFD application code which uses PETSc as its linear solver engine, we evaluate the effect of explicit communication overlap using task-based parallelism and show how to further improve performance by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
