Know Your RAG: Dataset Taxonomy and Generation Strategies for Evaluating RAG Systems
Rafael Teixeira de Lima (1), Shubham Gupta (1), Cesar Berrospi (2),, Lokesh Mishra (2), Michele Dolfi (2), Peter Staar (2), Panagiotis Vagenas (2), ((1) IBM Research Paris-Saclay, (2) IBM Research Zurich)

TL;DR
This paper addresses the challenges in evaluating RAG systems by proposing dataset characterization and targeted generation strategies, demonstrating that small fine-tuned LLMs can effectively produce quality Q&A datasets for better system assessment.
Contribution
It introduces a taxonomy for RAG datasets, highlights issues with current data generation methods, and proposes label-based characterization and fine-tuned LLMs for improved dataset creation.
Findings
Public Q&A datasets can mislead RAG performance evaluation.
Common dataset generation tools can produce unbalanced data.
Fine-tuned small LLMs can generate effective Q&A datasets.
Abstract
Retrieval Augmented Generation (RAG) systems are a widespread application of Large Language Models (LLMs) in the industry. While many tools exist empowering developers to build their own systems, measuring their performance locally, with datasets reflective of the system's use cases, is a technological challenge. Solutions to this problem range from non-specific and cheap (most public datasets) to specific and costly (generating data from local documents). In this paper, we show that using public question and answer (Q&A) datasets to assess retrieval performance can lead to non-optimal systems design, and that common tools for RAG dataset generation can lead to unbalanced data. We propose solutions to these issues based on the characterization of RAG datasets through labels and through label-targeted data generation. Finally, we show that fine-tuned small LLMs can efficiently generate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsContext-Aware Activity Recognition Systems · Quality and Safety in Healthcare
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Weight Decay · Linear Warmup With Linear Decay · Linear Layer · Layer Normalization · WordPiece · Attention Dropout · Multi-Head Attention · Byte Pair Encoding
