PRISM: Dynamic Primitive-Based Forecasting for Large-Scale GPU Cluster Workloads
Xin Wu, Fei Teng, Xingwang Li, Bin Zheng, Qiang Duan

TL;DR
PRISM is a novel forecasting framework that uses primitive-based decomposition and spectral refinement to accurately predict volatile, heterogeneous GPU workloads, improving resource management in AI infrastructure.
Contribution
It introduces a dual representation approach combining dictionary-driven decomposition with spectral refinement for workload forecasting.
Findings
Achieves state-of-the-art forecasting accuracy on large-scale GPU traces
Reduces burst-phase errors significantly
Provides a robust foundation for dynamic GPU resource management
Abstract
Accurately forecasting GPU workloads is essential for AI infrastructure, enabling efficient scheduling, resource allocation, and power management. Modern workloads are highly volatile, multiple periodicity, and heterogeneous, making them challenging for traditional predictors. We propose PRISM, a primitive-based compositional forecasting framework combining dictionary-driven temporal decomposition with adaptive spectral refinement. This dual representation extracts stable, interpretable workload signatures across diverse GPU jobs. Evaluated on large-scale production traces, PRISM achieves state-of-the-art results. It significantly reduces burst-phase errors, providing a robust, architecture-aware foundation for dynamic resource management in GPU-powered AI platforms.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Parallel Computing and Optimization Techniques · Big Data and Digital Economy
