Small Talk, Big Impact: The Energy Cost of Thanking AI
Julien Delavande, Regis Pierrard, Sasha Luccioni

TL;DR
This paper investigates the energy costs associated with polite interactions with large language models, revealing how factors like message length and model size influence energy consumption, and offering insights for more sustainable AI use.
Contribution
It provides a detailed quantification of the energy impact of polite language in LLM interactions, using real-world data and fine-grained measurements.
Findings
Longer inputs and outputs increase energy use
Model size correlates with higher energy consumption
Politeness adds measurable energy costs
Abstract
Being polite is free - or is it? In this paper, we quantify the energy cost of seemingly innocuous messages such as ``thank you'' when interacting with large language models, often used by users to convey politeness. Using real-world conversation traces and fine-grained energy measurements, we quantify how input length, output length and model size affect energy use. While politeness is our motivating example, it also serves as a controlled and reproducible proxy for measuring the energy footprint of a typical LLM interaction. Our findings provide actionable insights for building more sustainable and efficient LLM applications, especially in increasingly widespread real-world contexts like chat. As user adoption grows and billions of prompts are processed daily, understanding and mitigating this cost becomes crucial - not just for efficiency, but for sustainable AI deployment.
Peer Reviews
Decision·Submitted to ICLR 2026
**-- Systematic approach in measuring the energy efficiency of LLMs.** The authors separately measure the energy consumption across devices (CPU, GPU, RAM), LLM phases (infilling vs decoding). Such a study is conducted across different model families and sizes, confirming a linear energy cost growth for output < 10k tokens and quadratic growth for output > 10k tokens. **-- Clear decomposition of energy sources.** Separating infilling vs. decoding energy provides a useful mental model for practi
**-- Title–content mismatch.** The title frames the paper as a study of “thanking AI,” but the body mostly describes generic token-cost modeling. The interesting hook (are polite/add-on tokens worth it?) is not really answered. **-- No evaluation of performance vs. cost.** The core question should be: does removing “useless” or low-utility tokens (e.g., “thank you”) save the energy bill at any cost of output quality/user satisfaction? Based on how this paper is written, it seems the authors bel
1. Novel, reproducible problem formulation: Treating “thank you” as a controlled micro-interaction unit is an elegant proxy for LLM inference energy. 2. Clear phase separation : Prefill vs. decode decomposition (Figures 2–3) maps energy use to architectural structure. 3. Quantitative analytical model: Fitted latency/energy formulas (Section 5) with numeric coefficients allow predictive estimation. 4. I think that this a very interesting topic.
1. They use pyRAPL (Intel RAPL counters) on AMD EPYC 7R13, which lacks compatible energy registers. No adaptation or external calibration is reported. Thus CPU energy (and total 0.245 Wh) lacks credibility. Authors must explain or replace with physical power-meter data. 2. All results use FP32 precision, though real deployments use FP16/BF16/FP8/INT8. FP32 exaggerates power and latency. Reported numbers overstate real-world costs. Authors should test mixed-precision or clarify limitation. 3. “Pr
- The paper is well written and generally easy to follow. - Provides detailed measurements of energy consumption across model sizes and computation phases.
- The paper claims to provide “actionable insights for building more sustainable and efficient LLM applications,” but the discussion does not articulate what these actionable insights are beyond the general assumption of reducing computation. - The pre-fill phase should not be included in the measured energy cost of saying “thank you,” since in realistic chat situations, pre-filling is already cached. - Figures 2 and 3 include a “generation” category that is not defined or discussed in the tex
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI in Service Interactions · Speech and dialogue systems · Topic Modeling
