Revisiting the Time Cost Model of AllReduce
Dian Xiong, Li Chen, Youhe Jiang, Dan Li, Shuai Wang, Songtao Wang

TL;DR
This paper updates the traditional AllReduce time cost model by adding two new terms, leading to better algorithm design and significant performance improvements on modern clusters.
Contribution
It introduces GenModel, an augmented time cost model for AllReduce, and proposes GenTree, a new algorithm optimized for tree-like topologies, validated by experiments.
Findings
Augmented the $(eta,eta,eta)$ model with incast and memory access terms.
Discovered two new optimality conditions for AllReduce algorithms.
GenTree achieves up to 1.65x speed-up over NCCL in real tests.
Abstract
AllReduce is an important and popular collective communication primitive, which has been widely used in areas such as distributed machine learning and high performance computing. To design, analyze, and choose from various algorithms and implementations of AllReduce, the time cost model plays a crucial role, and the predominant one is the model. In this paper, we revisit this model, and reveal that it cannot well characterize the time cost of AllReduce on modern clusters; thus must be updated. We perform extensive measurements to identify two additional terms contributing to the time cost: the incast term and the memory access term. We augment the model with these two terms, and present GenModel as a result. Using GenModel, we discover two new optimalities for AllReduce algorithms, and prove that they cannot be achieved simultaneously.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsConstraint Satisfaction and Optimization · Graph Theory and Algorithms · Distributed and Parallel Computing Systems
