Mixed-mode implementation of PETSc for scalable linear algebra on multi-core processors
Michele Weiland, Lawrence Mitchell, Gerard Gorman, Stephan Kramer,, Mark Parsons, James Southern

TL;DR
This paper presents a mixed-mode (MPI + OpenMP) implementation of PETSc to improve scalable linear algebra performance on multi-core processors, demonstrating superior results over pure MPI in CFD applications.
Contribution
It introduces OpenMP threading to PETSc, addressing performance issues and enabling efficient multi-core utilization for scalable linear algebra computations.
Findings
Mixed-mode PETSc outperforms pure MPI in benchmarks.
OpenMP integration improves performance on modern multi-core processors.
Application to CFD matrices shows practical efficiency gains.
Abstract
With multi-core processors a ubiquitous building block of modern supercomputers, it is now past time to enable applications to embrace these developments in processor design. To achieve exascale performance, applications will need ways of exploiting the new levels of parallelism that are exposed in modern high-performance computers. A typical approach to this is to use shared-memory programming techniques to best exploit multi-core nodes along with inter-node message passing. In this paper, we describe the addition of OpenMP threaded functionality to the PETSc library. We highlight some issues that hinder good performance of threaded applications on modern processors and describe how to negate them. The OpenMP branch of PETSc was benchmarked using matrices extracted from Fluidity, a CFD application code, which uses the library as its linear solver engine. The overall performance of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Distributed and Parallel Computing Systems · Embedded Systems Design Techniques
