DVFS-Aware DNN Inference on GPUs: Latency Modeling and Performance Analysis

Yunchu Han; Zhaojun Nan; Sheng Zhou; Zhisheng Niu

arXiv:2502.06295·cs.LG·June 23, 2025

DVFS-Aware DNN Inference on GPUs: Latency Modeling and Performance Analysis

Yunchu Han, Zhaojun Nan, Sheng Zhou, Zhisheng Niu

PDF

Open Access

TL;DR

This paper presents a DVFS-aware latency model for GPU-based DNN inference, significantly improving accuracy over CPU-based models and enabling substantial reductions in inference time and energy consumption.

Contribution

The paper introduces a novel GPU-specific DVFS-aware inference time model, validated through extensive experiments and outperforming CPU-based models in optimizing DNN inference.

Findings

01

Achieves at least 66% reduction in inference time.

02

Reduces energy consumption by at least 69%.

03

Improves partition policies for cooperative inference.

Abstract

The rapid development of deep neural networks (DNNs) is inherently accompanied by the problem of high computational costs. To tackle this challenge, dynamic voltage frequency scaling (DVFS) is emerging as a promising technology for balancing the latency and energy consumption of DNN inference by adjusting the computing frequency of processors. However, most existing models of DNN inference time are based on the CPU-DVFS technique, and directly applying the CPU-DVFS model to DNN inference on GPUs will lead to significant errors in optimizing latency and energy consumption. In this paper, we propose a DVFS-aware latency model to precisely characterize DNN inference time on GPUs. We first formulate the DNN inference time based on extensive experiment results for different devices and analyze the impact of fitting parameters. Then by dividing DNNs into multiple blocks and obtaining the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBrain Tumor Detection and Classification · Advanced Neural Network Applications · Machine Learning and ELM