POLCA: Power Oversubscription in LLM Cloud Providers
Pratyush Patel, Esha Choukse, Chaojie Zhang, \'I\~nigo Goiri, Brijesh, Warrier, Nithish Mahalingam, Ricardo Bianchini

TL;DR
POLCA is a framework that enables safe power oversubscription in GPU clusters for large language models, increasing server deployment efficiency by 30% during inference with minimal performance impact.
Contribution
This paper introduces POLCA, a novel framework for power oversubscription in LLM GPU clusters, addressing challenges in power management and demonstrating significant efficiency gains.
Findings
Power oversubscription can increase server deployment by 30% during inference.
Inference workloads have substantial headroom for power oversubscription.
POLCA effectively manages power oversubscription with minimal performance loss.
Abstract
Recent innovation in large language models (LLMs), and their myriad use-cases have rapidly driven up the compute capacity demand for datacenter GPUs. Several cloud providers and other enterprises have made substantial plans of growth in their datacenters to support these new workloads. One of the key bottleneck resources in datacenters is power, and given the increasing model sizes of LLMs, they are becoming increasingly power intensive. In this paper, we show that there is a significant opportunity to oversubscribe power in LLM clusters. Power oversubscription improves the power efficiency of these datacenters, allowing more deployable servers per datacenter, and reduces the deployment time, since building new datacenters is slow. We extensively characterize the power consumption patterns of a variety of LLMs and their configurations. We identify the differences between the inference…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFerroelectric and Negative Capacitance Devices · Advanced Neural Network Applications · Topic Modeling
MethodsALIGN
