Zeus: Understanding and Optimizing GPU Energy Consumption of DNN Training
Jie You, Jae-Won Chung, Mosharaf Chowdhury

TL;DR
Zeus is an optimization framework that balances energy efficiency and training performance for deep neural networks by automatically tuning configurations using online profiling, significantly reducing energy consumption.
Contribution
This paper introduces Zeus, a novel online optimization framework that dynamically balances energy efficiency and training performance for DNNs, avoiding offline profiling.
Findings
Zeus improves energy efficiency by up to 75.8%.
Zeus adapts to data drifts over time.
Zeus reduces energy consumption without sacrificing training speed.
Abstract
Training deep neural networks (DNNs) is becoming increasingly more resource- and energy-intensive every year. Unfortunately, existing works primarily focus on optimizing DNN training for faster completion, often without considering the impact on energy efficiency. In this paper, we observe that common practices to improve training performance can often lead to inefficient energy usage. More importantly, we demonstrate that there is a tradeoff between energy consumption and performance optimization. To this end, we propose Zeus, an optimization framework to navigate this tradeoff by automatically finding optimal job- and GPU-level configurations for recurring DNN training jobs. Zeus uses an online exploration-exploitation approach in conjunction with just-in-time energy profiling, averting the need for expensive offline measurements, while adapting to data drifts over time. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · IoT and Edge/Fog Computing · Machine Learning and Data Classification
