Efficient and Eventually Consistent Collective Operations
Roman Iakymchuk, Amandio Faustino, Andrew Emerson, Joao Barreto,, Valeria Bartsch, Rodrigo Rodrigues, Jose C. Monteiro

TL;DR
This paper introduces an efficient, eventually consistent approach to collective operations in parallel computing, reducing communication overhead and improving performance for ML/DL and HPC applications, especially in strong scaling scenarios.
Contribution
It proposes a novel design for eventually consistent collectives, optimizing Broadcast and Reduce, and integrates classic collectives into GASPI, demonstrating promising preliminary performance gains.
Findings
Significant improvements in Allreduce and AlltoAll performance
Reduced communication in Broadcast and Reduce operations
Enhanced GASPI ecosystem with new collective implementations
Abstract
Collective operations are common features of parallel programming models that are frequently used in High-Performance (HPC) and machine/ deep learning (ML/ DL) applications. In strong scaling scenarios, collective operations can negatively impact the overall application performance: with the increase in core count, the load per rank decreases, while the time spent in collective operations increases logarithmically. In this article, we propose a design for eventually consistent collectives suitable for ML/ DL computations by reducing communication in Broadcast and Reduce, as well as by exploring the Stale Synchronous Parallel (SSP) synchronization model for the Allreduce collective. Moreover, we also enrich the GASPI ecosystem with frequently used classic/ consistent collective operations -- such as Allreduce for large messages and AlltoAll used in an HPC code. Our implementations show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
