Instant GPU Efficiency Visibility at Fleet Scale

Connor Pedersen; Dong H. Ahn; Michel Migdal; Collin Neale; Nik Konyuchenko

arXiv:2605.20799·cs.DC·May 21, 2026

Instant GPU Efficiency Visibility at Fleet Scale

Connor Pedersen, Dong H. Ahn, Michel Migdal, Collin Neale, Nik Konyuchenko

PDF

TL;DR

This paper introduces OFU, a hardware-level GPU efficiency metric that accurately monitors AI workload performance across diverse GPU generations and precisions without application modifications.

Contribution

The paper presents OFU, a novel, hardware-based GPU efficiency metric that requires no application instrumentation and is effective across multiple GPU models and numeric precisions.

Findings

01

OFU predicts application MFU within <=2% after correction.

02

OFU correlates at r=0.78 with application MFU on 608 jobs.

03

OFU detected a 2.5x efficiency regression in large-scale GPU fleets.

Abstract

We present Overall FLOP Utilization (OFU), a hardware-level, precision-agnostic GPU efficiency metric for AI workloads on HPC systems, derived from two on-chip performance counters: Tensor Pipe Activity and SM clock frequency. OFU requires no application instrumentation and works across GPU generations and numeric precisions. We characterize five properties of the OFU approximation -- tile quantization, floating-point precision scaling, clock sampling noise, Tensor Core clock domains, and non-tensor undercounting -- through controlled GEMM experiments on H100 and GB200 across FP16, TF32, FP8, and NVFP4. After tile-quantization correction, OFU predicts application-level MFU to within <=2 percentage points. Against 608 production training jobs, OFU achieves r = 0.78 correlation with application-level MFU and surfaces two framework-level FLOPs miscalculations. Deployed across large-scale…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.