Bandwidth Optimal Pipeline Schedule for Collective Communication
Liangyu Zhao, Arvind Krishnamurthy

TL;DR
This paper introduces a polynomial-time algorithm for creating bandwidth-optimal collective communication schedules on any network topology, enhancing efficiency in distributed systems.
Contribution
It provides a universal, provably optimal scheduling algorithm for allgather and related operations on arbitrary network topologies modeled as directed graphs.
Findings
Achieves bandwidth optimality for allgather and reduce-scatter.
Works on arbitrary network topologies with switches and heterogeneous links.
Extensible to other collective communication operations.
Abstract
We present a strongly polynomial-time algorithm to generate bandwidth optimal allgather/reduce-scatter on any network topology, with or without switches. Our algorithm constructs pipeline schedules achieving provably the best possible bandwidth performance on a given topology. To provide a universal solution, we model the network topology as a directed graph with heterogeneous link capacities and switches directly as vertices in the graph representation. The algorithm is strongly polynomial-time with respect to the topology size. This work heavily relies on previous graph theory work on edge-disjoint spanning trees and edge splitting. While we focus on allgather, the methods in this paper can be easily extended to generate schedules for reduce, broadcast, reduce-scatter, and allreduce.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware-Defined Networks and 5G · Caching and Content Delivery · Opportunistic and Delay-Tolerant Networks
MethodsFocus
