ChipLight: Cross-Layer Optimization of Chiplet Design with Optical Interconnects for LLM Training
Kangbo Bai, Zhantong Zhu, Yifan Ding, Tianyu Jia

TL;DR
ChipLight introduces a cross-layer optimization framework combining chiplet and optical interconnect technologies to enhance large language model training efficiency in distributed systems.
Contribution
It presents a novel multi-objective design and optimization method for training clusters that integrates chiplet architecture, training strategies, and optical network topology.
Findings
Significantly improves training efficiency in large-scale LLM training.
Provides valuable design insights for future training cluster development.
Achieves optimized communication performance through combined cross-layer design.
Abstract
In large-scale distributed LLM training, communication between devices becomes the key performance bottleneck. Chiplet technology can integrate multiple dies into a package to scale-up node performance with higher bandwidth. Meanwhile, optical interconnect (OI) technology offers long-reach, high-bandwidth links, making it well suited for scale-out networks. The combination of these two technologies has the potential to overcome communication bottlenecks within and across packages. In this work, we present ChipLight, a cross-layer multi-objective design and optimization method for training clusters leveraging chiplet and OI. We first abstract an architecture model for such complex clusters, co-optimizing chiplet architecture, training parallel strategy, and OI network topology. Based on such models, we tailor the design space exploration flow by combining both black-box and white-box…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
