From Servers to Sites: Compositional Power Trace Generation of LLM Inference for Infrastructure Planning
Grant Wilkins, Fiodar Kazhamiaka, Ram Rajagopal

TL;DR
This paper presents a compositional framework for generating detailed power traces of LLM inference workloads in datacenters, enabling more accurate infrastructure planning and analysis.
Contribution
It introduces a novel trace-generation method that models LLM inference power consumption through workload transitions and configuration-specific distributions, validated across multiple settings.
Findings
Median absolute energy error below 5% for most configurations
Preserves temporal autocorrelation in generated traces
Supports detailed infrastructure analysis beyond static assumptions
Abstract
Datacenter operators and electrical utilities rely on power traces at different spatiotemporal scales. Operators use fine-grained traces for provisioning, facility management, and scheduling, while utilities use site-level load profiles for capacity and interconnection planning. Existing datacenter power models do not capture LLM inference workloads, in which GPUs shift rapidly among compute-intensive prefill, lower-power decode, and idle states, and facility demand depends on how these states evolve and synchronize across many devices. We show that LLM inference power can be represented compositionally through two components: workload-driven transitions among operating states and configuration-specific power distributions within those states. Building on this observation, we develop a trace-generation framework that learns from measured traces and synthesizes power profiles for new…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Software System Performance and Reliability · Parallel Computing and Optimization Techniques
