SAGE: Selective Attention-Guided Extraction for Token-Efficient Document Indexing
Xinzhi Wang, Peter Baile Chen, Gerardo Vitagliano, Matthew Russo, Jun Chen, Michael Cafarella, Samuel Madden, Chunwei Liu

TL;DR
SAGE is a plug-and-play framework that uses attention signals from a lightweight LLM to efficiently select relevant document segments, enabling high-quality question answering with significantly reduced context size.
Contribution
It introduces a training-free, attention-guided context reduction method that improves long-document QA performance without fine-tuning or complex tuning.
Findings
Achieves top-4 ranking on QuALITY-hard with only 10% context
Reduces token usage by 90% while maintaining accuracy
Outperforms traditional reduction techniques on multiple benchmarks
Abstract
Large language models with long context windows can answer complex questions directly from full-length academic, technical, and policy documents, but passing entire documents is often costly, slow, and can degrade answer quality while increasing the risk of unnecessary data leakage. This paper targets the common setting of answering many heterogeneous questions over long document(s), where fixed position heuristics and standard retrieval-augmented generation (RAG) can fail due to document structure variability and weak query-chunk semantic similarity, which often requires task- and domain-specific tuning of embedding retrievers. We propose {Selective Attention-Guided Extraction} (\ourmethod), a training-free, plug-and-play context reduction framework that uses a lightweight local LLM to perform a single prefilling pass and convert language model attention signals into a query-specific…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
