Enhancing RAG Efficiency with Adaptive Context Compression
Shuyu Guo, Shuo Zhang, Zhaochun Ren

TL;DR
This paper introduces ACC-RAG, a dynamic context compression framework for retrieval-augmented generation that adapts compression rates based on input complexity, significantly improving inference efficiency while maintaining accuracy.
Contribution
The paper presents a novel adaptive compression method for RAG that adjusts compression levels dynamically, unlike fixed-rate approaches, enhancing efficiency without accuracy loss.
Findings
ACC-RAG outperforms fixed-rate methods in efficiency.
Achieves over 4x faster inference on multiple datasets.
Maintains or improves accuracy compared to standard RAG.
Abstract
Retrieval-augmented generation (RAG) enhances large language models (LLMs) with external knowledge but incurs significant inference costs due to lengthy retrieved contexts. While context compression mitigates this issue, existing methods apply fixed compression rates, over-compressing simple queries or under-compressing complex ones. We propose Adaptive Context Compression for RAG (ACC-RAG), a framework that dynamically adjusts compression rates based on input complexity, optimizing inference efficiency without sacrificing accuracy. ACC-RAG combines a hierarchical compressor (for multi-granular embeddings) with a context selector to retain minimal sufficient information, akin to human skimming. Evaluated on Wikipedia and five QA datasets, ACC-RAG outperforms fixed-rate methods and matches/unlocks over 4 times faster inference versus standard RAG while maintaining or improving accuracy.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Data Compression Techniques · Context-Aware Activity Recognition Systems
