The anachronism of whole-GPU accounting
Igor Sfiligoi, David Schultz, Frank W\"urthwein, Benedikt Riedel and, Dmitry Y. Mishin

TL;DR
This paper argues that GPU accounting should shift from whole-GPU metrics to core-hour metrics to better reflect performance differences and sharing, supported by extensive empirical measurements across various GPU models and infrastructures.
Contribution
It introduces a new approach to GPU accounting based on core hours and validates it through comprehensive runtime experiments on multiple GPU models and sharing scenarios.
Findings
Whole-GPU accounting is outdated due to performance variability.
GPU core hours provide a more accurate measure of compute output.
Sharing at infrastructure level impacts GPU utilization and accounting.
Abstract
NVIDIA has been making steady progress in increasing the compute performance of its GPUs, resulting in order of magnitude compute throughput improvements over the years. With several models of GPUs coexisting in many deployments, the traditional accounting method of treating all GPUs as being equal is not reflecting compute output anymore. Moreover, for applications that require significant CPU-based compute to complement the GPU-based compute, it is becoming harder and harder to make full use of the newer GPUs, requiring sharing of those GPUs between multiple applications in order to maximize the achievable science output. This further reduces the value of whole-GPU accounting, especially when the sharing is done at the infrastructure level. We thus argue that GPU accounting for throughput-oriented infrastructures should be expressed in GPU core hours, much like it is normally done for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
