WATOS: Efficient LLM Training Strategies and Architecture Co-exploration for Wafer-scale Chip
Huizheng Wang, Zichuan Wang, Hongbin Wang, Jingxiang Hou, Taiquan Wei, Chao Li, Yang Hu, Shouyi Yin

TL;DR
WATOS is a co-exploration framework that optimizes wafer-scale chip architecture and training strategies for large language models, significantly improving training throughput by leveraging high D2D bandwidth and resource allocation.
Contribution
It introduces a configurable hardware template and a co-exploration approach to optimize wafer-scale architecture and training strategies for LLMs, addressing resource trade-offs and performance.
Findings
Achieves 2.74x throughput improvement over Megatron.
Achieves 1.53x throughput improvement over Cerebras' strategy.
Provides new insights into wafer-scale architecture design for LLM training.
Abstract
Training large language models (LLMs) imposes extreme demands on computation, memory capacity, and interconnect bandwidth, driven by their ever-increasing parameter scales and intensive data movement. Wafer-scale integration offers a promising solution by densely integrating multiple single-die chips with high-speed die-to-die (D2D) interconnects. However, the limited wafer area necessitates trade-offs among compute, memory, and communication resources. Fully harnessing the potential of wafer-scale integration while mitigating its architectural constraints is essential for maximizing LLM training performance. This imposes significant challenges for the co-optimization of architecture and training strategies. Unfortunately, existing approaches all fall short in addressing these challenges. To bridge the gap, we propose WATOS, a co-exploration framework for LLM training strategy and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmbedded Systems Design Techniques · Parallel Computing and Optimization Techniques · VLSI and FPGA Design Techniques
