Power Aware Dynamic Reallocation For Inference
Yiwei Jiang, Sangeeta Chowdhary, Nathaniel Morris, Rutwik Jain, Srilatha Manne, Sam Bayliss

TL;DR
This paper introduces RAPID, a power-aware framework for disaggregated LLM inference that dynamically manages GPU roles and power budgets to enhance performance within strict power limits, significantly improving efficiency.
Contribution
RAPID is the first framework to jointly optimize GPU roles and power budgets for disaggregated inference, achieving better performance under power constraints.
Findings
Up to 2x improvement in SLO attainment at peak load.
Significant performance gains over static power assignment.
Enhanced application consistency under power caps.
Abstract
Disaggregation has emerged as a powerful strategy for optimizing large language model (LLM) inference by separating compute-intensive prefill and memory-bound decode phases across specialized GPUs. This separation improves utilization and throughput under fixed hardware capacity. However, as model and cluster scales grow, power, rather than compute, has become the dominant limiter of overall performance and cost efficiency. In this paper, we propose RAPID, a power-aware disaggregated inference framework that jointly manages GPU roles and power budgets to sustain goodput within strict power caps. RAPID utilizes static and dynamic power reallocation in addition to GPU reallocation to improve performance under fixed power bounds. RAPID improves overall performance and application consistency beyond what is achievable in current disaggregation solutions, resulting in up to a 2x improvement…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Natural Language Processing Techniques · Parallel Computing and Optimization Techniques
