Babbling Suppression: Making LLMs Greener One Token at a Time
Lola Solovyeva, Fernando Castor

TL;DR
This paper introduces Babbling Suppression, a method that reduces unnecessary token generation in large language models during code generation by integrating test execution, leading to significant energy savings without sacrificing accuracy.
Contribution
It proposes a practical, model-agnostic approach to minimize babbling in LLMs by terminating generation after passing tests, improving efficiency and sustainability.
Findings
Babbling occurs across all tested models, more in Java than Python.
Babbling Suppression reduces energy consumption by up to 65%.
Token generation decreases in most cases, with minimal overhead.
Abstract
Context: Large Language Models (LLMs) are increasingly used in modern software development, aiding in code generation, code completion, and refactoring through AI-powered assistants. While they accelerate development workflows, they often produce extraneous output, referred to as "babbling", which incurs additional cognitive, economic, and energy costs. Objective: This work investigates the problem of babbling in LLM-based code generation and proposes a practical, model-agnostic approach to reduce unnecessary output without compromising solution accuracy. Method: We introduce Babbling Suppression (BS), a method that integrates test execution into the LLM generation process by evaluating intermediate outputs and terminating generation once a solution passes all tests. This prevents excessive token generation while having no impact on model accuracy. An empirical study was conducted…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
