When Less is More: The LLM Scaling Paradox in Context Compression

Ruishan Guo; Yibing Liu; Guoxin Ma; Yan Wang; Yueyang Zhang; Long Xia; Kecheng Chen; Zhiyuan Sun; Daiting Shi

arXiv:2602.09789·cs.LG·May 12, 2026

When Less is More: The LLM Scaling Paradox in Context Compression

Ruishan Guo, Yibing Liu, Guoxin Ma, Yan Wang, Yueyang Zhang, Long Xia, Kecheng Chen, Zhiyuan Sun, Daiting Shi

PDF

TL;DR

This paper uncovers a paradox where increasing compressor size in language models can reduce context fidelity despite lower reconstruction error, due to knowledge overwriting and semantic drift.

Contribution

It identifies and analyzes the Size-Fidelity Paradox in context compression, revealing how larger models can harm faithful context reconstruction.

Findings

01

Mid-sized compressors outperform larger ones in faithful recovery.

02

Larger models tend to overwrite facts and paraphrase content, reducing fidelity.

03

Compressors organize memory into broader semantic subspaces, increasing ambiguity.

Abstract

Scaling up model parameters has long been a prevalent training paradigm driven by the assumption that larger models yield superior generation capabilities. However, under lossy context compression in a compressor--decoder setup, we find a \textbf{\textit{Size-Fidelity Paradox}}: increasing compressor size can lessen the faithfulness of reconstructed contexts though reconstruction error decreases. Across 27 compressor setups spanning model families, scales, and compression rates, we coin this paradox arising from two dominant factors: 1) \textit{knowledge overwriting}: larger models increasingly replace source facts with their own prior beliefs, \textit{e.g.}, ``the white strawberry`` $\to$ ``the red strawberry``; and 2) \textit{semantic drift}: larger models tend to paraphrase or restructure content instead of reproducing it verbatim, \textit{e.g.}, ``Alice hit Bob`` $\to$ ``Bob hit…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.