EncouRAGe: Evaluating RAG Local, Fast, and Reliable

Jan Strich; Adeline Scharfenberg; Chris Biemann; Martin Semmann

arXiv:2511.04696·cs.CL·November 10, 2025

EncouRAGe: Evaluating RAG Local, Fast, and Reliable

Jan Strich, Adeline Scharfenberg, Chris Biemann, Martin Semmann

PDF

Open Access

TL;DR

EncouRAGe is a modular Python framework that simplifies development and evaluation of RAG systems with a focus on reproducibility, local deployment, and comprehensive benchmarking across multiple datasets.

Contribution

The paper introduces EncouRAGe, a flexible, extensible framework for RAG systems that emphasizes reproducibility, diverse metrics, and local testing, with extensive evaluation on benchmark datasets.

Findings

01

RAG underperforms compared to Oracle Context.

02

Hybrid BM25 achieves the best results across datasets.

03

Reranking offers marginal improvements with increased latency.

Abstract

We introduce EncouRAGe, a comprehensive Python framework designed to streamline the development and evaluation of Retrieval-Augmented Generation (RAG) systems using Large Language Models (LLMs) and Embedding Models. EncouRAGe comprises five modular and extensible components: Type Manifest, RAG Factory, Inference, Vector Store, and Metrics, facilitating flexible experimentation and extensible development. The framework emphasizes scientific reproducibility, diverse evaluation metrics, and local deployment, enabling researchers to efficiently assess datasets within RAG workflows. This paper presents implementation details and an extensive evaluation across multiple benchmark datasets, including 25k QA pairs and over 51k documents. Our results show that RAG still underperforms compared to the Oracle Context, while Hybrid BM25 consistently achieves the best results across all four datasets.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Machine Learning in Materials Science · Natural Language Processing Techniques