TL;DR
This paper introduces GPOEO, an online framework that optimizes GPU energy consumption during machine learning training by dynamically balancing energy use and training time, leading to significant energy savings.
Contribution
GPOEO is a novel online energy optimization framework that employs multi-objective prediction and search techniques to reduce GPU energy consumption during training workloads.
Findings
Achieves 16.2% average energy savings
Increases training time by only 5.1% on average
Effective across diverse machine learning workloads
Abstract
GPUs are widely used to accelerate the training of machine learning workloads. As modern machine learning models become increasingly larger, they require a longer time to train, leading to higher GPU energy consumption. This paper presents GPOEO, an online GPU energy optimization framework for machine learning training workloads. GPOEO dynamically determines the optimal energy configuration by employing novel techniques for online measurement, multi-objective prediction modeling, and search optimization. To characterize the target workload behavior, GPOEO utilizes GPU performance counters. To reduce the performance counter profiling overhead, it uses an analytical model to detect the training iteration change and only collects performance counter data when an iteration shift is detected. GPOEO employs multi-objective models based on gradient boosting and a local search algorithm to find…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
