AdaComp: Extractive Context Compression with Adaptive Predictor for Retrieval-Augmented Large Language Models
Qianchi Zhang, Hainan Zhang, Liang Pang, Hongwei Zheng, Zhiming Zheng

TL;DR
AdaComp is a novel adaptive context compression method for retrieval-augmented large language models that reduces inference costs by predicting the optimal number of documents needed per query.
Contribution
It introduces a low-cost, adaptive compression technique that dynamically determines the number of documents to retain based on query complexity and retrieval quality.
Findings
Significantly reduces inference costs in RAG systems.
Maintains near-original performance levels with compressed context.
Effective across multiple QA datasets and conversational settings.
Abstract
Retrieved documents containing noise will hinder RAG from detecting answer clues and make the inference process slow and expensive. Therefore, context compression is necessary to enhance its accuracy and efficiency. Existing context compression methods use extractive or generative models to retain the most query-relevant sentences or apply the information bottleneck theory to preserve sufficient information. However, these methods may face issues such as over-compression or high computational costs. We observe that the retriever often ranks relevant documents at the top, but the exact number of documents needed to answer the query is uncertain due to the impact of query complexity and retrieval quality: complex queries like multi-hop questions may require retaining more documents than simpler queries, and a low-quality retrieval may need to rely on more documents to generate accurate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
