Beyond Traffic Matrix: DELTA -- A DAG-Aware OCS Logical Topology Optimization for AIDCs
Niangen Ye, Jingya Liu, Weiqiang Sun, Weisheng Hu

TL;DR
DELTA is a DAG-aware topology optimization framework for AI data centers that reduces optical port usage and communication time by leveraging workload DAGs and innovative MILP modeling.
Contribution
It introduces a novel DAG-based optimization method with variable-length intervals and scalable acceleration strategies for large-scale AI data centers.
Findings
Reduces communication time by up to 17.5% compared to baselines.
Cuts optical port consumption by at least 20%.
Improves workload performance gap by up to 26.1%.
Abstract
The rapid scaling of large language models (LLMs) exacerbates communication bottlenecks in AI data centers (AIDCs). To overcome this, optical circuit switches (OCS) are increasingly adopted for their superior bandwidth capacity and energy efficiency. However, their reconfiguration overhead precludes intra-iteration topology update, necessitating a priori engineering of a static topology to absorb time-varying LLM traffic. Existing methods engineer these topologies based on traffic matrices. However, this representation obscures the bursty concurrent bandwidth demands dictated by parallelization strategies and fails to account for the independent channels required for concurrent communication. To address this, we propose DELTA, an efficient logical topology optimization framework for AIDCs that leverages the computation-communication directed acyclic graph (DAG) to encode time-varying…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
