Online GPU Energy Optimization with Switching-Aware Bandits
Xiongxiao Xu, Solomon Abera Bekele, Brice Videau, Kai Shu

TL;DR
This paper presents EnergyUCB, an online bandit-based controller that optimizes GPU energy consumption in HPC systems by balancing energy savings, performance, and switching overhead, demonstrated on real supercomputing workloads.
Contribution
It introduces a practical online GPU energy optimization method using a multi-armed bandit approach with switching-aware and QoS-constrained features, addressing real-world challenges.
Findings
EnergyUCB achieves significant energy savings in supercomputing workloads.
The QoS-constrained variant reliably maintains user-specified performance budgets.
The method effectively balances exploration, exploitation, and switching overhead in real-time.
Abstract
Energy consumption has become a bottleneck for future computing architectures, from wearable devices to leadership-class supercomputers. Existing energy management techniques largely target CPUs, even though GPUs now dominate power draw in heterogeneous high performance computing (HPC) systems. Moreover, many prior methods rely on either purely offline or hybrid offline and online training, which is impractical and results in energy inefficiencies during data collection. In this paper, we introduce a practical online GPU energy optimization problem in a HPC scenarios. The problem is challenging because (1) GPU frequency scaling exhibits performance-energy trade-offs, (2) online control must balance exploration and exploitation, and (3) frequent frequency switching incurs non-trivial overhead and degrades quality of service (QoS). To address the challenges, we formulate online GPU energy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Smart Grid Energy Management · Recommender Systems and Techniques
