Sustaining Performance While Reducing Energy Consumption: A Control Theory Approach
Eric Rutten (CTRL-A ), Sophie Cerf (CTRL-A ), Rapha\"el Bleuse (CTRL-A, ), Valentin Reis (ANL), Swann Perarnau (ANL)

TL;DR
This paper introduces a control theory-based method for dynamically managing power in high-performance computing systems to sustain performance while reducing energy consumption, leveraging feedback control and workload monitoring.
Contribution
It presents a novel control-theoretic approach for autonomous, real-time power regulation in HPC systems, integrating system modeling and PI control for energy efficiency.
Findings
Effective power capping without performance loss
Successful deployment on real HPC clusters
Significant energy savings demonstrated
Abstract
Production high-performance computing systems continue to grow in complexity and size. As applications struggle to make use of increasingly heterogeneous compute nodes, maintaining high efficiency (performance per watt) for the whole platform becomes a challenge. Alongside the growing complexity of scientific workloads, this extreme heterogeneity is also an opportunity: as applications dynamically undergo variations in workload, due to phases or data/compute movement between devices, one can dynamically adjust power across compute elements to save energy without impacting performance. With an aim toward an autonomous and dynamic power management strategy for current and future HPC architectures, this paper explores the use of control theory for the design of a dynamic power regulation method. Structured as a feedback loop, our approach-which is novel in computing resource…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Distributed and Parallel Computing Systems · Cloud Computing and Resource Management
