Energy-Conscious LLM Decoding: Impact of Text Generation Strategies on GPU Energy Consumption
Alireza Nik, Michael A. Riegler, P{\aa}l Halvorsen

TL;DR
This study explores how different text decoding strategies in Large Language Models affect GPU energy consumption, revealing significant trade-offs between energy efficiency and output quality across various tasks.
Contribution
It is among the first to analyze decoding strategies in LLMs from an energy consumption perspective, offering insights for energy-efficient text generation.
Findings
Decoding strategies significantly impact GPU energy usage.
Trade-offs exist between text quality and energy efficiency.
No single decoding method is optimal for all metrics.
Abstract
Decoding strategies significantly influence the quality and diversity of the generated text in Large Language Models (LLMs), yet their impact on computational resources, particularly GPU energy consumption, is insufficiently studied. This paper investigates the relationship between text generation decoding techniques and energy efficiency, focusing on the trade-off between generation quality and GPU energy usage across diverse tasks and decoding configurations. By benchmarking multiple strategies across various tasks, including Translation, Math Problem Solving, Coding, and Open-ended text generation, we reveal how selecting appropriate decoding techniques with their tuned hyperparameters affects text quality and has measurable implications for energy consumption. Our findings show that the choice of decoding strategy can greatly impact GPU energy usage, even when it has a minimal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
