FeB4RAG: Evaluating Federated Search in the Context of Retrieval Augmented Generation
Shuai Wang, Ekaterina Khramtsova, Shengyao Zhuang, Guido Zuccon

TL;DR
This paper introduces FeB4RAG, a new dataset for federated search tailored to Retrieval-Augmented Generation, addressing the lack of modern benchmarks for heterogeneous data sources in RAG systems.
Contribution
The paper presents FeB4RAG, a novel dataset derived from BEIR, designed specifically for federated search in RAG frameworks, including relevance judgments and evaluation of search quality impact.
Findings
High-quality federated search improves RAG response relevance
Naive federated search approaches underperform compared to optimized methods
FeB4RAG enables development and benchmarking of advanced federated search techniques
Abstract
Federated search systems aggregate results from multiple search engines, selecting appropriate sources to enhance result quality and align with user intent. With the increasing uptake of Retrieval-Augmented Generation (RAG) pipelines, federated search can play a pivotal role in sourcing relevant information across heterogeneous data sources to generate informed responses. However, existing datasets, such as those developed in the past TREC FedWeb tracks, predate the RAG paradigm shift and lack representation of modern information retrieval challenges. To bridge this gap, we present FeB4RAG, a novel dataset specifically designed for federated search within RAG frameworks. This dataset, derived from 16 sub-collections of the widely used \beir benchmarking collection, includes 790 information requests (akin to conversational queries) tailored for chatbot applications, along with top…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Cryptography and Data Security · Caching and Content Delivery
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · WordPiece · Linear Warmup With Linear Decay · Dropout · Linear Layer · Weight Decay · Byte Pair Encoding · Attention Dropout · Dense Connections · Adam
