Towards Green AI: Decoding the Energy of LLM Inference in Software Development
Lola Solovyeva, Fernando Castor

TL;DR
This paper analyzes the energy consumption of large language model inference during software development, revealing phase-specific patterns and proposing a method to significantly reduce energy use by suppressing babbling behavior.
Contribution
It provides a detailed phase-level analysis of LLM inference energy costs and introduces babbling suppression to cut energy consumption by up to 89%.
Findings
Prefill costs influence decoding energy consumption.
Babbling behavior inflates energy use and can be suppressed.
Energy savings of up to 89% achieved without loss of accuracy.
Abstract
Context: AI-assisted tools are increasingly integrated into software development workflows, but their reliance on large language models (LLMs) introduces substantial computational and energy costs. Understanding and reducing the energy footprint of LLM inference is therefore essential for sustainable software development. Objective: In this study, we conduct a phase-level analysis of LLM inference energy consumption, distinguishing between the (1) prefill, where the model processes the input and builds internal representations, and (2) decoding, where output tokens are generated using the stored state. Method: We investigate six 6B-7B and four 3B-4B transformer-based models, evaluating them on code-centric benchmarks HumanEval for code generation and LongBench for code understanding. Results: Our findings show that, within both parameter groups, models exhibit distinct energy patterns…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Green IT and Sustainability · Machine Learning in Materials Science
