OmniBench-RAG: A Multi-Domain Evaluation Platform for Retrieval-Augmented Generation Tools
Jiaxuan Liang, Shide Zhou, and Kailong Wang

TL;DR
OmniBench-RAG is an automated, multi-domain evaluation platform that systematically measures the accuracy and efficiency of retrieval-augmented generation systems across diverse fields, addressing reproducibility and comparability issues.
Contribution
The paper introduces OmniBench-RAG, a standardized, domain-aware evaluation framework with novel metrics for assessing RAG systems' performance and efficiency.
Findings
Significant variability in RAG effectiveness across domains.
RAG improves performance in culture but declines in mathematics.
The platform enables reproducible, comprehensive comparisons of RAG models.
Abstract
While Retrieval Augmented Generation (RAG) is now widely adopted to enhance LLMs, evaluating its true performance benefits in a reproducible and interpretable way remains a major hurdle. Existing methods often fall short: they lack domain coverage, employ coarse metrics that miss sub document precision, and fail to capture computational trade offs. Most critically, they provide no standardized framework for comparing RAG effectiveness across different models and domains. We introduce OmniBench RAG, a novel automated platform for multi domain evaluation of RAG systems. The platform quantifies performance gains across accuracy and efficiency dimensions, spanning nine knowledge fields including culture, geography, and health. We introduce two standardized metrics: Improvements (accuracy gains) and Transformation (efficiency differences between pre RAG and post RAG models), enabling…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Recommender Systems and Techniques
