SARA: Selective and Adaptive Retrieval-augmented Generation with Context Compression
Yiqiao Jin, Kartik Sharma, Vineeth Rakesh, Yingtong Dou, Menghai Pan, Mahashweta Das, Srijan Kumar

TL;DR
SARA is a novel retrieval-augmented generation framework that effectively balances detailed context and semantic coverage using dual-level context representations, improving answer relevance and correctness across multiple datasets and models.
Contribution
SARA introduces a unified approach combining natural-language snippets and semantic compression vectors for enhanced context efficiency in RAG systems.
Findings
Improves answer relevance by +17.71
Enhances answer correctness by +13.72
Boosts semantic similarity by +15.53
Abstract
Retrieval-augmented Generation (RAG) extends large language models (LLMs) with external knowledge but faces key challenges: restricted effective context length and redundancy in retrieved documents. Pure compression-based approaches reduce input size but often discard fine-grained details essential for factual accuracy. We propose SARA, a unified RAG framework that balances local precision and global knowledge coverage under tight context budgets. SARA combines natural-language text snippets with semantic compression vectors to jointly enhance context efficiency and answer correctness. It represents contexts at two complementary levels: 1) fine-grained natural-language spans that preserve critical entities and numerical values, and 2) compact, interpretable vectors that summarize high-level semantics. An iterative evidence-selection module employs the compression vectors for dynamic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques
MethodsLinear Warmup With Linear Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Attention Dropout · Byte Pair Encoding · Dense Connections · Softmax · Layer Normalization · Dropout · BERT · BART
