BERGEN: A Benchmarking Library for Retrieval-Augmented Generation
David Rau, Herv\'e D\'ejean, Nadezhda Chirkova, Thibault Formal, Shuai, Wang, Vassilina Nikoulina, St\'ephane Clinchant

TL;DR
BERGEN is an open-source benchmarking library designed to standardize and facilitate reproducible evaluation of Retrieval-Augmented Generation systems, enabling fair comparison of different components like retrievers, rerankers, and LLMs.
Contribution
This work introduces BERGEN, a comprehensive library that standardizes RAG benchmarking, addressing inconsistencies and enabling systematic evaluation of retrieval and generation components.
Findings
Benchmarking different retrievers, rerankers, and LLMs for QA.
Analysis of RAG metrics and datasets.
Identification of best practices for RAG evaluation.
Abstract
Retrieval-Augmented Generation allows to enhance Large Language Models with external knowledge. In response to the recent popularity of generative LLMs, many RAG approaches have been proposed, which involve an intricate number of different configurations such as evaluation datasets, collections, metrics, retrievers, and LLMs. Inconsistent benchmarking poses a major challenge in comparing approaches and understanding the impact of each component in the pipeline. In this work, we study best practices that lay the groundwork for a systematic evaluation of RAG and present BERGEN, an end-to-end library for reproducible research standardizing RAG experiments. In an extensive study focusing on QA, we benchmark different state-of-the-art retrievers, rerankers, and LLMs. Additionally, we analyze existing RAG metrics and datasets. Our open-source library BERGEN is available under…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsSpeech and dialogue systems · Natural Language Processing Techniques · Topic Modeling
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Weight Decay · Multi-Head Attention · Residual Connection · WordPiece · Softmax · Byte Pair Encoding · Layer Normalization
