Improving all-reduce collective operations for imbalanced process arrival patterns
Jerzy Proficz

TL;DR
This paper introduces two novel algorithms, SLT and PRR, optimized for imbalanced process arrival patterns in all-reduce operations, enhancing scalability and performance in distributed computing.
Contribution
The paper presents new algorithms and online detection methods for imbalanced process arrivals, improving all-reduce efficiency over traditional algorithms.
Findings
High scalability demonstrated in experiments
Performance improvements over ring and Rabenseifner algorithms
Effective online process arrival detection
Abstract
Two new algorithms for the all-reduce operation, optimized for imbalanced process arrival patterns (PAPs) are presented: (i) sorted linear tree (SLT), (ii) pre-reduced ring (PRR) as well as a new way of on-line PAP detection, including process arrival time (PAT) estimations and their distribution between cooperating processes was introduced. The idea, pseudo-code, implementation details, benchmark for performance evaluation and a real case example for machine learning are provided. The results of the experiments were described and analyzed, showing that the proposed solution has high scalability and improved performance in comparison with the usually used ring and Rabenseifner algorithms.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
