CBA: Communication-Bound-Aware Cross-Domain Resource Assignment for Pipeline-Parallel Distributed LLM Training in Dynamic Multi-DC Optical Networks
Dianxuan Fu, Xiaomin Liu, Yihao Zhang, Shikui Shen, Weisheng Hu, Qunbi Zhuge

TL;DR
This paper introduces a resource assignment framework for distributed large language model training over multi-datacenter optical networks, significantly reducing iteration time and blocking requests by optimizing communication-bound factors.
Contribution
It presents a novel communication-bound-aware resource assignment method tailored for pipeline-parallel training in multi-DC optical networks, improving efficiency over existing approaches.
Findings
Lowered iteration time by 31.25%.
Reduced blocking requests by 13.20%.
Enhanced resource utilization in multi-DC optical networks.
Abstract
We propose a communication-bound-aware cross-domain resource assignment framework for pipeline-parallel distributed training over multi-datacenter optical networks, which lowers iteration time by 31.25% and reduces 13.20% blocking requests compared to baselines.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Optical Network Technologies · Optical Network Technologies · Cloud Computing and Resource Management
