EnterpriseRAG-Bench: A RAG Benchmark for Company Internal Knowledge

Yuhong Sun; Joachim Rahmfeld; Chris Weaver; Weijia Chen; Roshan Desai; Wenxi Huang; Mark H. Butler

arXiv:2605.05253·cs.IR·May 21, 2026

EnterpriseRAG-Bench: A RAG Benchmark for Company Internal Knowledge

Yuhong Sun, Joachim Rahmfeld, Chris Weaver, Weijia Chen, Roshan Desai, Wenxi Huang, Mark H. Butler

PDF

1 Repo 2 Datasets

TL;DR

EnterpriseRAG-Bench provides a synthetic, realistic dataset and evaluation framework for testing retrieval-augmented generation models on company-internal knowledge sources, addressing a gap in existing benchmarks.

Contribution

It introduces a large-scale, multi-source enterprise dataset with a generation framework and leaderboard, enabling realistic benchmarking of RAG models on proprietary data.

Findings

01

Dataset includes 500,000 documents across nine enterprise sources.

02

Questions test various retrieval and reasoning capabilities.

03

Framework allows customization for different industries and data sources.

Abstract

Retrieval-Augmented Generation (RAG) has become the standard approach for grounding large language models in information that was not available during training. While existing datasets and benchmarks focus on web or other public sources, there is still no widely adopted dataset that realistically reflects the nature of company-internal knowledge. Meanwhile, startups, enterprises, and researchers are increasingly developing AI Agents designed to operate over exactly this kind of proprietary data. To close this gap, we release a synthetic enterprise corpus, its generation framework, and a leaderboard. We present EnterpriseRAG-Bench, a dataset consisting of approximately 500,000 documents spanning nine enterprise source types (Slack, Gmail, Linear, Google Drive, HubSpot, Fireflies, GitHub, Jira, and Confluence) and 500 questions across ten categories that test distinct retrieval and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

onyx-dot-app/EnterpriseRAG-Bench
github

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.