Reducing Compute Waste in LLMs through Kernel-Level DVFS

Jeffrey Spaan; Kuan-Hsun Chen; Ana-Lucia Varbanescu

arXiv:2601.08539·cs.PF·January 14, 2026

Reducing Compute Waste in LLMs through Kernel-Level DVFS

Jeffrey Spaan, Kuan-Hsun Chen, Ana-Lucia Varbanescu

PDF

Open Access

TL;DR

This paper introduces a kernel-level DVFS method for LLMs that significantly reduces energy waste with minimal performance impact, outperforming previous pass-level approaches.

Contribution

The paper presents a novel kernel-level DVFS technique that achieves greater energy savings in LLM training and inference without notable slowdowns.

Findings

01

Kernel-level DVFS saves up to 14.6% energy in GPT-3 training.

02

Pass-level DVFS achieves only 2% energy reduction.

03

Discovered frequencies are effective across data and tensor parallelism.

Abstract

The rapid growth of AI has fueled the expansion of accelerator- or GPU-based data centers. However, the rising operational energy consumption has emerged as a critical bottleneck and a major sustainability concern. Dynamic Voltage and Frequency Scaling (DVFS) is a well-known technique used to reduce energy consumption, and thus improve energy-efficiency, since it requires little effort and works with existing hardware. Reducing the energy consumption of training and inference of Large Language Models (LLMs) through DVFS or power capping is feasible: related work has shown energy savings can be significant, but at the cost of significant slowdowns. In this work, we focus on reducing waste in LLM operations: i.e., reducing energy consumption without losing performance. We propose a fine-grained, kernel-level, DVFS approach that explores new frequency configurations, and prove these save…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Big Data and Digital Economy · Advanced Neural Network Applications