Detecting Overflow in Compressed Token Representations for Retrieval-Augmented Generation

Julia Belikova; Danila Rozhevskii; Dennis Svirin; Konstantin Polev; Alexander Panchenko

arXiv:2602.12235·cs.CL·February 16, 2026

Detecting Overflow in Compressed Token Representations for Retrieval-Augmented Generation

Julia Belikova, Danila Rozhevskii, Dennis Svirin, Konstantin Polev, Alexander Panchenko

PDF

Open Access 1 Video

TL;DR

This paper introduces a method to detect when compressed token representations in large language models lose essential information, enabling better management of context length limitations in resource-constrained environments.

Contribution

It proposes a novel detection methodology for overflow in compressed representations, transitioning from query-agnostic to query-aware detection techniques.

Findings

01

Query-agnostic saturation statistics can distinguish compressed from uncompressed tokens.

02

Query-aware classifiers achieve 0.72 AUC-ROC in overflow detection.

03

Incorporating query information improves detection accuracy.

Abstract

Efficient long-context processing remains a crucial challenge for contemporary large language models (LLMs), especially in resource-constrained environments. Soft compression architectures promise to extend effective context length by replacing long token sequences with smaller sets of learned compressed tokens. Yet, the limits of compressibility -- and when compression begins to erase task-relevant content -- remain underexplored. In this paper, we define token overflow as a regime in which compressed representations no longer contain sufficient information to answer a given query, and propose a methodology to characterize and detect it. In the xRAG soft-compression setting, we find that query-agnostic saturation statistics reliably separate compressed from uncompressed token representations, providing a practical tool for identifying compressed tokens but showing limited overflow…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Detecting Overflow in Compressed Token Representations for Retrieval-Augmented Generation· underline

Taxonomy

TopicsNatural Language Processing Techniques · Machine Learning and Algorithms · Topic Modeling