OmniBench-RAG: A Multi-Domain Evaluation Platform for Retrieval-Augmented Generation Tools

Jiaxuan Liang; Shide Zhou; and Kailong Wang

arXiv:2508.05650·cs.IR·August 11, 2025

OmniBench-RAG: A Multi-Domain Evaluation Platform for Retrieval-Augmented Generation Tools

Jiaxuan Liang, Shide Zhou, and Kailong Wang

PDF

Open Access 1 Datasets

TL;DR

OmniBench-RAG is an automated, multi-domain evaluation platform that systematically measures the accuracy and efficiency of retrieval-augmented generation systems across diverse fields, addressing reproducibility and comparability issues.

Contribution

The paper introduces OmniBench-RAG, a standardized, domain-aware evaluation framework with novel metrics for assessing RAG systems' performance and efficiency.

Findings

01

Significant variability in RAG effectiveness across domains.

02

RAG improves performance in culture but declines in mathematics.

03

The platform enables reproducible, comprehensive comparisons of RAG models.

Abstract

While Retrieval Augmented Generation (RAG) is now widely adopted to enhance LLMs, evaluating its true performance benefits in a reproducible and interpretable way remains a major hurdle. Existing methods often fall short: they lack domain coverage, employ coarse metrics that miss sub document precision, and fail to capture computational trade offs. Most critically, they provide no standardized framework for comparing RAG effectiveness across different models and domains. We introduce OmniBench RAG, a novel automated platform for multi domain evaluation of RAG systems. The platform quantifies performance gains across accuracy and efficiency dimensions, spanning nine knowledge fields including culture, geography, and health. We introduce two standardized metrics: Improvements (accuracy gains) and Transformation (efficiency differences between pre RAG and post RAG models), enabling…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

GarnettLiang/Omnibench-RAG
dataset· 9 dl
9 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Recommender Systems and Techniques