PGMPI: Automatically Verifying Self-Consistent MPI Performance Guidelines
Sascha Hunold, Alexandra Carpen-Amarie, Felix Donatus L\"ubbe, Jesper, Larsson Tr\"aff

TL;DR
PGMPI is an automated framework that verifies whether MPI libraries adhere to self-consistent performance guidelines for collective operations, helping identify and fix performance issues in large-scale parallel systems.
Contribution
This paper introduces PGMPI, a novel automated benchmarking framework for detecting violations of MPI performance guidelines, aiding developers in optimizing MPI library performance.
Findings
PGMPI successfully detects performance guideline violations.
It helps identify unexpected performance degradations.
Adapting algorithms can overcome identified performance issues.
Abstract
The Message Passing Interface (MPI) is the most commonly used application programming interface for process communication on current large-scale parallel systems. Due to the scale and complexity of modern parallel architectures, it is becoming increasingly difficult to optimize MPI libraries, as many factors can influence the communication performance. To assist MPI developers and users, we propose an automatic way to check whether MPI libraries respect self-consistent performance guidelines for collective communication operations. We introduce the PGMPI framework to detect violations of performance guidelines through benchmarking. Our experimental results show that PGMPI can pinpoint undesired and often unexpected performance degradations of collective MPI operations. We demonstrate how to overcome performance issues of several libraries by adapting the algorithmic implementations of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Data Storage Technologies · Scientific Computing and Data Management
