A Comprehensive Analysis of Process Energy Consumption on Multi-Socket Systems with GPUs
Luis G. Le\'on-Vega, Niccol\`o Tosato, Stefano Cozzini

TL;DR
This paper introduces two mathematical models to accurately estimate energy consumption of CPU and GPU processes in multi-socket HPC systems, addressing challenges in shared environments like cloud computing.
Contribution
The work presents novel models for process-level energy estimation in HPC systems that do not require process isolation, improving accuracy and applicability in shared environments.
Findings
CPU power prediction error of 1.9%
GPU power prediction with 9.7% relative error
Models enable energy accounting without process isolation
Abstract
Robustly estimating energy consumption in High-Performance Computing (HPC) is essential for assessing the energy footprint of modern workloads, particularly in fields such as Artificial Intelligence (AI) research, development, and deployment. The extensive use of supercomputers for AI training has heightened concerns about energy consumption and carbon emissions. Existing energy estimation tools often assume exclusive use of computing nodes, a premise that becomes problematic with the advent of supercomputers integrating microservices, as seen in initiatives like Acceleration as a Service (XaaS) and cloud computing. This work investigates the impact of executed instructions on overall power consumption, providing insights into the comprehensive behaviour of HPC systems. We introduce two novel mathematical models to estimate a process's energy consumption based on the total node…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Distributed and Parallel Computing Systems
