BERGEN: A Benchmarking Library for Retrieval-Augmented Generation

David Rau; Herv\'e D\'ejean; Nadezhda Chirkova; Thibault Formal; Shuai; Wang; Vassilina Nikoulina; St\'ephane Clinchant

arXiv:2407.01102·cs.CL·July 2, 2024

BERGEN: A Benchmarking Library for Retrieval-Augmented Generation

David Rau, Herv\'e D\'ejean, Nadezhda Chirkova, Thibault Formal, Shuai, Wang, Vassilina Nikoulina, St\'ephane Clinchant

PDF

Open Access 1 Repo 1 Video

TL;DR

BERGEN is an open-source benchmarking library designed to standardize and facilitate reproducible evaluation of Retrieval-Augmented Generation systems, enabling fair comparison of different components like retrievers, rerankers, and LLMs.

Contribution

This work introduces BERGEN, a comprehensive library that standardizes RAG benchmarking, addressing inconsistencies and enabling systematic evaluation of retrieval and generation components.

Findings

01

Benchmarking different retrievers, rerankers, and LLMs for QA.

02

Analysis of RAG metrics and datasets.

03

Identification of best practices for RAG evaluation.

Abstract

Retrieval-Augmented Generation allows to enhance Large Language Models with external knowledge. In response to the recent popularity of generative LLMs, many RAG approaches have been proposed, which involve an intricate number of different configurations such as evaluation datasets, collections, metrics, retrievers, and LLMs. Inconsistent benchmarking poses a major challenge in comparing approaches and understanding the impact of each component in the pipeline. In this work, we study best practices that lay the groundwork for a systematic evaluation of RAG and present BERGEN, an end-to-end library for reproducible research standardizing RAG experiments. In an extensive study focusing on QA, we benchmark different state-of-the-art retrievers, rerankers, and LLMs. Additionally, we analyze existing RAG metrics and datasets. Our open-source library BERGEN is available under…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

naver/bergen
pytorchOfficial

Videos

BERGEN: A Benchmarking Library for Retrieval-Augmented Generation· underline

Taxonomy

TopicsSpeech and dialogue systems · Natural Language Processing Techniques · Topic Modeling

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Weight Decay · Multi-Head Attention · Residual Connection · WordPiece · Softmax · Byte Pair Encoding · Layer Normalization