Preparing HPC Applications for the Exascale Era: A Decoupling Strategy
Ivy Bo Peng, Roberto Gioiosa, Gokcen Kestor, Erwin Laure, Stefano, Markidis

TL;DR
This paper introduces a decoupling strategy for HPC applications that separates operations into process groups and uses dataflow processing, significantly improving scalability and performance on large supercomputers.
Contribution
It presents a novel decoupling approach with a proof-of-concept MPI implementation that enhances scalability by pipelining diverse operations in large-scale applications.
Findings
Achieves up to 4x performance improvement on 8,192 processes
Reduces load imbalance impact and increases parallel efficiency
Effective in scientific and data-analytics applications
Abstract
Production-quality parallel applications are often a mixture of diverse operations, such as computation- and communication-intensive, regular and irregular, tightly coupled and loosely linked operations. In conventional construction of parallel applications, each process performs all the operations, which might result inefficient and seriously limit scalability, especially at large scale. We propose a decoupling strategy to improve the scalability of applications running on large-scale systems. Our strategy separates application operations onto groups of processes and enables a dataflow processing paradigm among the groups. This mechanism is effective in reducing the impact of load imbalance and increases the parallel efficiency by pipelining multiple operations. We provide a proof-of-concept implementation using MPI, the de-facto programming system on current supercomputers. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
