Familiarity-Aware Evidence Compression for Retrieval-Augmented Generation
Dongwon Jung, Qin Liu, Tenghao Huang, Ben Zhou, Muhao Chen

TL;DR
FaviComp is a training-free evidence compression method that enhances retrieval-augmented generation by making retrieved evidence more familiar to language models, leading to significant improvements in open-domain QA accuracy.
Contribution
It introduces FaviComp, a novel, training-free evidence compression technique that improves evidence relevance and model performance in retrieval-augmented generation tasks.
Findings
Outperforms recent evidence compression baselines in QA accuracy
Achieves up to 28.1% accuracy improvement
Maintains high compression rates while enhancing evidence familiarity
Abstract
Retrieval-augmented generation (RAG) improves large language models (LMs) by incorporating non-parametric knowledge through evidence retrieved from external sources. However, it often struggles to cope with inconsistent and irrelevant information that can distract the LM from its tasks, especially when multiple evidence pieces are required. While compressing the retrieved evidence with a compression model aims to address this issue, the compressed evidence may still be unfamiliar to the target model used for downstream tasks, potentially failing to utilize the evidence effectively. We propose FaviComp (Familarity-Aware Evidence Compression), a novel training-free evidence compression technique that makes retrieved evidence more familiar to the target model, while seamlessly integrating parametric knowledge from the model. Experimental results show that FaviComp consistently outperforms…
Peer Reviews
Decision·ICLR 2025 Conference Withdrawn Submission
- The idea of incorporating retrieved external knowledge with parametric knowledge via ensemble decoding is interesting. - The proposed FaviComp is a training-tree method and can be easily applied to existing LLMs for content compression in retrieval-augmented generation (RAG). - FaviComp achieves competitive performance compared to baseline methods across five QA benchmarks.
- Unfair comparison with baseline models. Despite using the same backbone model as the target model, the proposed FaviComp uses more advanced Llama3-8B-instruct/Mistral-7B-Instruct as the compression model, while the baselines use less capable models, such as T5-large or Llama2-7B, for compression. What is the rationale behind this inconsistent choice? Can we use the same backbone model (e.g., Llama3-8B-instruct/Mistral-7B-Instruct) for compression in both the baseline methods (e.g., LongLLMLing
This paper presents a training-free solution for adapting retrieved evidence to a target model, an interesting and innovative direction. Experimental results on LLaMA 3 8B Instruct and Mistral 7B Instruct demonstrate that the proposed solution outperforms trained ranker and compressor models.
While empirical results are provided, the intuitive design that relies on the perplexity of both the compressor and target models requires further justification. Given that target models often have limitations such as restricted knowledge and tendencies toward hallucination, it would be beneficial to address how these limitations impact the proposed approach’s effectiveness. Missing discussion of related work: Bridging the Preference Gap between Retrievers and LLMs
1. Compression-based RAG presents a promising research direction in terms of reducing inference latency and improving robustness to irrelevant retrieved evidence. 2. This paper proposes an interesting perspective of enabling the target model to become familiar with the compressed summary.
1. The motivation of compression-based RAG is to improve the latency of standard RAG. However, ensembling both the decoding probabilities of the compression model and the target model does not save the inference costs compared to standard RAG. I'm not sure whether FAVICOMP even consumes more computation than standard RAG. 2. I have doubts about the validity of the assumption that ensembling decoding probabilities under different conditions. The compression model is decoding under line 159: (Evi
1. An effective prompt compression method is proposed, which achieves the optimal compression ratio on multiple datasets while improving the performance indicators of the model. 2. No training is required, and compression is performed during inference, which reduces the training cost, but there is also a concern that the inference time will increase. 3. The paper is generally clear and the presentation is generally clear.
1. Some details need to be clarified. The theoretical basis and intuition of the motivation need to be further elaborated. 2. The novelty of the method is limited. In essence, it is more like an ensemble that generates different evidences at different prompts, using the target LLM to generate evidence and the original document to generate evidence. 3. Experimental ablation needs to be increased.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Topic Modeling · Natural Language Processing Techniques
MethodsSparse Evolutionary Training
