Prediction-Based Power Oversubscription in Cloud Platforms
Alok Kumbhare, Reza Azimi, Ioannis Manousakis, Anand Bonde, Felipe, Frujeri, Nithish Mahalingam, Pulkit Misra, Seyyed Ahmad Javadi, Bianca, Schroeder, Marcus Fontoura, Ricardo Bianchini

TL;DR
This paper proposes a prediction-based approach to increase power oversubscription in cloud datacenters, specifically in Microsoft Azure, by leveraging workload predictions to safely double oversubscription levels.
Contribution
It introduces a novel workload prediction method for criticality-aware power management, enabling higher oversubscription with minimal impact on performance.
Findings
Achieved 2x increase in oversubscription
Maintained performance of critical workloads
Demonstrated effectiveness in Microsoft Azure infrastructure
Abstract
Datacenter designers rely on conservative estimates of IT equipment power draw to provision resources. This leaves resources underutilized and requires more datacenters to be built. Prior work has used power capping to shave the rare power peaks and add more servers to the datacenter, thereby oversubscribing its resources and lowering capital costs. This works well when the workloads and their server placements are known. Unfortunately, these factors are unknown in public clouds, forcing providers to limit the oversubscription so that performance is never impacted. In this paper, we argue that providers can use predictions of workload performance criticality and virtual machine (VM) resource utilization to increase oversubscription. This poses many challenges, such as identifying the performance-critical workloads from black-box VMs, creating support for criticality-aware power…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · IoT and Edge/Fog Computing · Software System Performance and Reliability
