The Compression Paradox in LLM Inference: Provider-Dependent Energy Effects of Prompt Compression

Warren Johnson

arXiv:2603.23528·cs.CL·March 26, 2026

The Compression Paradox in LLM Inference: Provider-Dependent Energy Effects of Prompt Compression

Warren Johnson

PDF

Open Access

TL;DR

This study investigates how prompt compression affects energy efficiency in large language model inference, revealing provider-dependent effects and that token reduction alone is unreliable for energy savings.

Contribution

It provides the first large-scale empirical analysis of prompt compression's impact on energy consumption across multiple providers and benchmarks, highlighting provider-specific behaviors.

Findings

01

DeepSeek shows output expansion and increased energy use under compression.

02

GPT-4o-mini exhibits mixed energy effects, including reductions at certain compression ratios.

03

Prompt compression often leads to significant quality loss and is provider-dependent.

Abstract

The rapid proliferation of Large Language Models has created an environmental paradox: the very technology that could help solve climate challenges is itself becoming a significant contributor to global carbon emissions. We test whether prompt compression improves inference energy efficiency in 28,421 successful API trials (28,428 planned) across three providers (OpenAI GPT-4o-mini, Anthropic Claude-3.5-Sonnet, and DeepSeek-Chat), five benchmarks (HumanEval, MBPP, GSM8K, MATH, MMLU), and four compression ratios (r in {1.0, 0.7, 0.5, 0.3}). Energy is estimated with a token-based proxy calibrated against local direct measurements, and quality is tracked with benchmark pass rates. Compression produced substantial quality loss (overall pass rate 26.0% at baseline vs. 1.5% at r=0.7) and strongly provider-dependent energy behavior. DeepSeek exhibited output expansion under compression (21 to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Explainable Artificial Intelligence (XAI) · Natural Language Processing Techniques