The Compression Paradox in LLM Inference: Provider-Dependent Energy Effects of Prompt Compression
Warren Johnson

TL;DR
This study investigates how prompt compression affects energy efficiency in large language model inference, revealing provider-dependent effects and that token reduction alone is unreliable for energy savings.
Contribution
It provides the first large-scale empirical analysis of prompt compression's impact on energy consumption across multiple providers and benchmarks, highlighting provider-specific behaviors.
Findings
DeepSeek shows output expansion and increased energy use under compression.
GPT-4o-mini exhibits mixed energy effects, including reductions at certain compression ratios.
Prompt compression often leads to significant quality loss and is provider-dependent.
Abstract
The rapid proliferation of Large Language Models has created an environmental paradox: the very technology that could help solve climate challenges is itself becoming a significant contributor to global carbon emissions. We test whether prompt compression improves inference energy efficiency in 28,421 successful API trials (28,428 planned) across three providers (OpenAI GPT-4o-mini, Anthropic Claude-3.5-Sonnet, and DeepSeek-Chat), five benchmarks (HumanEval, MBPP, GSM8K, MATH, MMLU), and four compression ratios (r in {1.0, 0.7, 0.5, 0.3}). Energy is estimated with a token-based proxy calibrated against local direct measurements, and quality is tracked with benchmark pass rates. Compression produced substantial quality loss (overall pass rate 26.0% at baseline vs. 1.5% at r=0.7) and strongly provider-dependent energy behavior. DeepSeek exhibited output expansion under compression (21 to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Explainable Artificial Intelligence (XAI) · Natural Language Processing Techniques
