Compression Method Matters: Benchmark-Dependent Output Dynamics in LLM Prompt Compression

Warren Johnson

arXiv:2603.23527·cs.CL·March 26, 2026

Compression Method Matters: Benchmark-Dependent Output Dynamics in LLM Prompt Compression

Warren Johnson

PDF

Open Access

TL;DR

This paper investigates how prompt compression affects output length and inference costs in large language models, revealing benchmark-dependent dynamics and proposing metrics for more reliable evaluation.

Contribution

It introduces the instruction survival probability (Psi) and the Compression Robustness Index (CRI), providing new tools to assess compression effects across different benchmarks.

Findings

01

Output expansion varies significantly across benchmarks.

02

Prompt structure, not provider identity, moderates compression effects.

03

Token savings may overstate actual energy savings.

Abstract

Prompt compression is often evaluated by input-token reduction, but its real deployment impact depends on how compression changes output length and total inference cost. We present a controlled replication and extension study of benchmark-dependent output dynamics under aggressive compression, covering 5,400 API calls across three benchmarks and multiple providers. To explain conflicting prior observations, we formalize instruction survival probability (Psi), a structural metric that captures whether task-critical prompt segments remain after truncation. Results show a strong benchmark effect: under r=0.3, DeepSeek exhibits severe output expansion on MBPP (56x, Psi approx 0.15) but substantially lower expansion on HumanEval (5x, Psi approx 0.72), while GPT-4o-mini is comparatively stable across benchmarks. This reconciles the apparent discrepancy between previously reported extreme…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Security and Verification in Computing · Green IT and Sustainability