On the Computation Rate of All-Reduce
Yufeng Zhou, Hua Sun

TL;DR
This paper analyzes the maximum computation rate of the All-Reduce operation in distributed networks, providing bounds and optimal rates for various network topologies.
Contribution
It introduces new upper and lower bounds on the All-Reduce computation rate, including optimal bounds for specific network classes.
Findings
Derived a cut-set upper bound on computation rate.
Established a linear programming lower bound based on time sharing.
Identified optimal rates for certain network topologies such as cyclic, complete, and hypercube networks.
Abstract
In the All-Reduce problem, each one of the K nodes holds an input and wishes to compute the sum of all K inputs through a communication network where each pair of nodes is connected by a parallel link with arbitrary bandwidth. The computation rate of All-Reduce is defined as the number of sum instances that can be computed over each network use. For the computation rate, we provide a cut-set upper bound and a linear programming lower bound based on time (bandwidth) sharing over all schemes that first perform Reduce (aggregating all inputs at one node) and then perform Broadcast (sending the sum from that node to all other nodes). Specializing the two general bounds gives us the optimal computation rate for a class of communication networks and the best-known rate bounds (where the upper bound is no more than twice of the lower bound) for cyclic, complete, and hypercube networks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInterconnection Networks and Systems · Cooperative Communication and Network Coding · Advanced Optical Network Technologies
