SAGE: Selective Attention-Guided Extraction for Token-Efficient Document Indexing

Xinzhi Wang; Peter Baile Chen; Gerardo Vitagliano; Matthew Russo; Jun Chen; Michael Cafarella; Samuel Madden; Chunwei Liu

arXiv:2604.15583·cs.DB·April 27, 2026

SAGE: Selective Attention-Guided Extraction for Token-Efficient Document Indexing

Xinzhi Wang, Peter Baile Chen, Gerardo Vitagliano, Matthew Russo, Jun Chen, Michael Cafarella, Samuel Madden, Chunwei Liu

PDF

TL;DR

SAGE is a plug-and-play framework that uses attention signals from a lightweight LLM to efficiently select relevant document segments, enabling high-quality question answering with significantly reduced context size.

Contribution

It introduces a training-free, attention-guided context reduction method that improves long-document QA performance without fine-tuning or complex tuning.

Findings

01

Achieves top-4 ranking on QuALITY-hard with only 10% context

02

Reduces token usage by 90% while maintaining accuracy

03

Outperforms traditional reduction techniques on multiple benchmarks

Abstract

Large language models with long context windows can answer complex questions directly from full-length academic, technical, and policy documents, but passing entire documents is often costly, slow, and can degrade answer quality while increasing the risk of unnecessary data leakage. This paper targets the common setting of answering many heterogeneous questions over long document(s), where fixed position heuristics and standard retrieval-augmented generation (RAG) can fail due to document structure variability and weak query-chunk semantic similarity, which often requires task- and domain-specific tuning of embedding retrievers. We propose {Selective Attention-Guided Extraction} (\ourmethod), a training-free, plug-and-play context reduction framework that uses a lightweight local LLM to perform a single prefilling pass and convert language model attention signals into a query-specific…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.