PolyThrottle: Energy-efficient Neural Network Inference on Edge Devices
Minghao Yan, Hongyi Wang, Shivaram Venkataraman

TL;DR
PolyThrottle is a novel method that optimizes hardware configurations for neural network inference on edge devices, significantly reducing energy consumption while maintaining performance.
Contribution
It introduces a constrained Bayesian optimization approach to tune hardware settings for energy-efficient neural network inference on edge devices.
Findings
Up to 36% energy savings for popular models.
PolyThrottle quickly finds near-optimal hardware configurations.
Validation of energy-performance trade-offs in edge inference.
Abstract
As neural networks (NN) are deployed across diverse sectors, their energy demand correspondingly grows. While several prior works have focused on reducing energy consumption during training, the continuous operation of ML-powered systems leads to significant energy use during inference. This paper investigates how the configuration of on-device hardware-elements such as GPU, memory, and CPU frequency, often neglected in prior studies, affects energy consumption for NN inference with regular fine-tuning. We propose PolyThrottle, a solution that optimizes configurations across individual hardware components using Constrained Bayesian Optimization in an energy-conserving manner. Our empirical evaluation uncovers novel facets of the energy-performance equilibrium showing that we can save up to 36 percent of energy for popular models. We also validate that PolyThrottle can quickly converge…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Green IT and Sustainability · Age of Information Optimization
