RARe: Retrieval Augmented Retrieval with In-Context Examples
Atula Tejaswi, Yoonsang Lee, Sujay Sanghavi, Eunsol Choi

TL;DR
This paper introduces RARe, a method that enhances encoder-only text retrieval models by finetuning with semantically similar in-context query-document examples, leading to improved retrieval performance and better out-of-domain generalization.
Contribution
RARe is the first approach to incorporate in-context examples into encoder-only retrieval models, demonstrating performance improvements and insights into in-context example design.
Findings
Up to +2.72% nDCG improvement on retrieval datasets
RARe shows stronger out-of-domain generalization
Analysis of in-context example augmentation strategies
Abstract
While in-context learning is well-studied with decoder-only language models (LLMs), its utility for encoder-only models remains underexplored. We study in-context learning for encoder-only models for text retrieval tasks. Can incorporating in-context examples (query-document pairs) to the target query enhance retriever performance? Our approach, RARe, finetunes a pre-trained model with in-context examples whose query is semantically similar to the target query. This approach achieves performance gains of up to +2.72% nDCG across open-domain retrieval datasets (BeIR, RAR-b) compared to using the target query only as an input. In particular, we find RARe exhibits stronger out-of-domain generalization compared to models using queries without in-context examples, similar to what is seen for in-context learning in LLMs. We further provide analysis on the design choices of in-context example…
Peer Reviews
Decision·Submitted to ICLR 2025
- While in-context learning is not new, its application to retriever models in this specific manner is new. The paper creatively adapts this technique, showing potential new directions for retrieval model improvements.
- While the application of in-context learning to retrievers is new, it might not strike everyone as a groundbreaking shift. - The approach, as well as the performance gain, looks a bit incremental. There might also be a tradeoff between efficiency, accuracy, and ease of use of the method. - Discussion on how RARe might scale or face challenges in real-world applications beyond the benchmarks used could be expanded.
1. The paper is well-structured and easy to follow. 2. Extensive experiments were conducted on recent base models and popular datasets, such as MS-MARCO, BeIR, and RAR-b, demonstrating that the proposed RARe method enhances baseline models, including Llama and other LLM-based retrievers. 3. Detailed ablation studies investigate critical questions, such as the impact of retrieved vs random in-context examples and whether semantically closer in-context examples are more beneficial.
1. There are no statistical significance tests to confirm that the improvements over baselines in Tables 1 and 2 are meaningful. 2. Only a basic retriever, BM25, was applied. 3. In Figure 3, the performance of ArguAna’s Retrieved/Random setup is worse than Random/Random, which is inconsistent with other datasets and lacks an explanation. 4. Figure 4 appears to contradict the paper’s premise, which relies on similar queries and their associated documents to enhance query representation. When scor
1. Studying adding in context examples for the query is a under explored topic for retrieval community. 2. The results are positive and the ablations are quite extensive.
1 The baselines are a bit weak where I am not sure how much value will the less than 2% improvement add. There are other leading models on BEIR benchmark and how will the proposed methods compare to those and will those method get improved after adding in context example?
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Topic Modeling
MethodsBalanced Selection
