A Few GPUs, A Whole Lotta Scale: Faithful LLM Training Emulation with PrismLLM
Shaoke Xi, ChonLam Lao, Boyi Jia, Jiaqi Gao, Zhipeng Zhang, Jiamin Cao, Brian Sutioso, Erci Xu, Minlan Yu, Kui Ren, Yong Li, Zhengping Qian, Ennan Zhai, and Jingren Zhou

TL;DR
PrismLLM enables faithful large-scale LLM training emulation using only a few GPUs, accurately reproducing performance and memory behaviors of extensive GPU clusters.
Contribution
PrismLLM introduces a high-fidelity emulation framework that decouples large-scale execution from physical cluster size, facilitating efficient debugging and optimization.
Findings
Achieves 0.58% average error in iteration time
Less than 0.01% error in peak GPU memory usage
Emulates clusters of up to 8192 GPUs with fewer than 1% of the physical GPUs
Abstract
Large language model (LLM) training today runs on clusters spanning thousands of GPUs. While this scale enables rapid model advances, developing, debugging, and performance-tuning the training framework inevitably becomes complex and costly. This is because engineers often need to reproduce production behaviors to diagnose failures or evaluate optimizations, thereby demanding frequent and even exclusive access to production-scale clusters -- which becomes increasingly hard given that the majority of GPUs are already committed to production workloads. Simulation relies on complex performance models that are difficult to maintain, and downscaled experiments often fail to capture scale-dependent behaviors. We present PrismLLM to decouple large-scale execution from the need to access large clusters, enabling engineers to run and observe ranks of interest under faithful large-scale…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
