A Reinforcement Learning Approach for Performance-aware Reduction in Power Consumption of Data Center Compute Nodes
Akhilesh Raj, Swann Perarnau, Aniruddha Gokhale

TL;DR
This paper presents a reinforcement learning-based method to dynamically manage power consumption in data center compute nodes, aiming to reduce energy use without degrading application performance.
Contribution
It introduces a novel RL-based power capping policy that balances energy efficiency and performance using real-time system observations and hardware controls.
Findings
RL agent effectively reduces power consumption
Maintains application performance during power capping
Demonstrates practical implementation on real hardware
Abstract
As Exascale computing becomes a reality, the energy needs of compute nodes in cloud data centers will continue to grow. A common approach to reducing this energy demand is to limit the power consumption of hardware components when workloads are experiencing bottlenecks elsewhere in the system. However, designing a resource controller capable of detecting and limiting power consumption on-the-fly is a complex issue and can also adversely impact application performance. In this paper, we explore the use of Reinforcement Learning (RL) to design a power capping policy on cloud compute nodes using observations on current power consumption and instantaneous application performance (heartbeats). By leveraging the Argo Node Resource Management (NRM) software stack in conjunction with the Intel Running Average Power Limit (RAPL) hardware control mechanism, we design an agent to control the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · IoT and Edge/Fog Computing · Blockchain Technology Applications and Security
