A Generalization of the Allreduce Operation
Dmitry Kolmakov, Xuecang Zhang

TL;DR
This paper introduces a novel permutation-based framework inspired by group theory to generalize Allreduce algorithms, effectively handling any number of processes and optimizing communication steps for latency and bandwidth.
Contribution
It presents a new mathematical approach to Allreduce, enabling algorithms that work efficiently for any process count and improve performance over traditional methods.
Findings
Successfully generalizes Allreduce algorithms for any number of processes.
Achieves latency-optimal and bandwidth-optimal communication steps.
Handles non-power-of-two process counts effectively.
Abstract
Allreduce is one of the most frequently used MPI collective operations, and thus its performance attracts much attention in the past decades. Many algorithms were developed with different properties and purposes. We present a novel approach to communication description based on the permutations inspired by the mathematics of a Rubik's cube where the moves form a mathematical structure called group. Similarly, cyclic communication patterns between a set of processes may be described by a permutation group. This new approach allows constructing a generalization of the widely used Allreduce algorithms such as Ring, Recursive Doubling and Recursive Halving. Using the developed approach we build an algorithm that successfully solves the well-known problem of the non-power-of-two number of processes which breaks down the performance of many existing algorithms. The proposed algorithm…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed systems and fault tolerance · Optimization and Search Problems · DNA and Biological Computing
