Knowing When to Stop: Efficient Context Processing via Latent Sufficiency Signals

Roy Xie; Junlin Wang; Paul Rosu; Chunyuan Deng; Bolun Sun; Zihao Lin; Bhuwan Dhingra

arXiv:2502.01025·cs.CL·February 10, 2026

Knowing When to Stop: Efficient Context Processing via Latent Sufficiency Signals

Roy Xie, Junlin Wang, Paul Rosu, Chunyuan Deng, Bolun Sun, Zihao Lin, Bhuwan Dhingra

PDF

Open Access

TL;DR

This paper introduces a dynamic context cutoff method for large language models, enabling them to self-terminate processing once sufficient information is obtained, improving efficiency and accuracy.

Contribution

The paper uncovers inherent sufficiency signals in attention heads and leverages them for self-termination, a novel approach to context efficiency in LLMs.

Findings

01

3.4% accuracy improvement over baselines

02

1.33x token reduction on average

03

Superior performance compared to other methods at similar token reductions

Abstract

Large language models (LLMs) process entire input contexts indiscriminately, which is inefficient when the information required to answer a query is localized within the context. We present dynamic context cutoff, a novel method enabling LLMs to self-terminate processing upon acquiring sufficient task-relevant information. Through analysis of model internals, we discover that specific attention heads inherently encode "sufficiency signals" -- detectable through lightweight classifiers -- that predict when critical information has been processed. This reveals a new efficiency paradigm: models' internal understanding naturally dictates processing needs rather than external compression heuristics. Comprehensive experiments across six QA datasets (up to 40K tokens) with three model families (LLaMA/Qwen/Mistral, 1B-70B) demonstrate 3.4% accuracy improvement while achieving 1.33x token…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsSoftmax · Attention Is All You Need