A Few GPUs, A Whole Lotta Scale: Faithful LLM Training Emulation with PrismLLM

Shaoke Xi; ChonLam Lao; Boyi Jia; Jiaqi Gao; Zhipeng Zhang; Jiamin Cao; Brian Sutioso; Erci Xu; Minlan Yu; Kui Ren; Yong Li; Zhengping Qian; Ennan Zhai; and Jingren Zhou

arXiv:2605.15617·cs.DC·May 18, 2026

A Few GPUs, A Whole Lotta Scale: Faithful LLM Training Emulation with PrismLLM

Shaoke Xi, ChonLam Lao, Boyi Jia, Jiaqi Gao, Zhipeng Zhang, Jiamin Cao, Brian Sutioso, Erci Xu, Minlan Yu, Kui Ren, Yong Li, Zhengping Qian, Ennan Zhai, and Jingren Zhou

PDF

TL;DR

PrismLLM enables faithful large-scale LLM training emulation using only a few GPUs, accurately reproducing performance and memory behaviors of extensive GPU clusters.

Contribution

PrismLLM introduces a high-fidelity emulation framework that decouples large-scale execution from physical cluster size, facilitating efficient debugging and optimization.

Findings

01

Achieves 0.58% average error in iteration time

02

Less than 0.01% error in peak GPU memory usage

03

Emulates clusters of up to 8192 GPUs with fewer than 1% of the physical GPUs

Abstract

Large language model (LLM) training today runs on clusters spanning thousands of GPUs. While this scale enables rapid model advances, developing, debugging, and performance-tuning the training framework inevitably becomes complex and costly. This is because engineers often need to reproduce production behaviors to diagnose failures or evaluate optimizations, thereby demanding frequent and even exclusive access to production-scale clusters -- which becomes increasingly hard given that the majority of GPUs are already committed to production workloads. Simulation relies on complex performance models that are difficult to maintain, and downscaled experiments often fail to capture scale-dependent behaviors. We present PrismLLM to decouple large-scale execution from the need to access large clusters, enabling engineers to run and observe ranks of interest under faithful large-scale…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.