Trivance: Latency-Optimal AllReduce by Shortcutting Multiport Networks
Anton Juerss, Vamsi Addanki, Stefan Schmid

TL;DR
Trivance is a new AllReduce algorithm that reduces communication steps and congestion, achieving latency-optimal performance in large-scale distributed systems with bidirectional ring and multidimensional torus topologies.
Contribution
It introduces Trivance, an AllReduce algorithm that completes in log_3 n steps, triples communication distance per step, and reduces congestion while maintaining bandwidth optimality.
Findings
Improves state-of-the-art by 5-30% for message sizes up to 8 MiB.
Reduces congestion by a factor of three compared to Bruck's algorithm.
Maintains latency advantage in multidimensional torus networks.
Abstract
AllReduce is a fundamental collective operation in distributed computing and a key performance bottleneck for large-scale training and inference. Its completion time is determined by the number of communication steps, which dominates latency-sensitive workloads, and the communication distance affecting both latency- and bandwidth-bound regimes. Direct-connect topologies, such as torus networks used in Google's TPUv4, are particularly prone to large communication distances due to limited bisection bandwidth. Latency-optimal algorithms such as Bruck's complete AllReduce in steps on a bidirectional ring, but incur large communication distances that result in substantial congestion. In contrast, recent approaches such as Swing reduce communication distance and congestion, but are inherently required to perform steps to complete AllReduce, sacrificing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Cloud Computing and Resource Management
