Enabling Reconfiguration-Communication Overlap for Collective Communication in Optical Networks
Changbo Wu, Zhuolong Yu, Gongming Zhao, Hongli Xu

TL;DR
SWOT is a demand-aware optical network framework that overlaps reconfiguration with data transmission, significantly reducing collective communication time in distributed machine learning.
Contribution
It introduces intra-collective reconfiguration techniques to dynamically align optical network resources with communication patterns, overcoming static topology limitations.
Findings
Reduces communication time by up to 89.7% in simulations.
Effectively overlaps reconfiguration latency with data transmission.
Demonstrates robustness to varying optical resources and delays.
Abstract
Collective communication (CC) is critical for scaling distributed machine learning (DML). The predictable traffic patterns of DML present a great opportunity for applying optical network technologies. Optical networks with reconfigurable topologies promise high bandwidth and low latency for collective communications. However, existing approaches face inherent limitations: static topologies are inefficient for dynamic communication patterns within CC algorithm, while frequent topology reconfiguration matching every step of the algorithm incurs significant overhead. In this paper, we propose SWOT, a demand-aware optical network framework that employs ``intra-collective reconfiguration'' to dynamically align network resources with CC traffic patterns. SWOT hides reconfiguration latency by overlapping it with data transmission through three key techniques: \textit{Heterogeneous Message…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
