mmRAG: A Modular Benchmark for Retrieval-Augmented Generation over Text, Tables, and Knowledge Graphs

Chuan Xu; Qiaosheng Chen; Yutong Feng; Gong Cheng

arXiv:2505.11180·cs.IR·May 19, 2025

mmRAG: A Modular Benchmark for Retrieval-Augmented Generation over Text, Tables, and Knowledge Graphs

Chuan Xu, Qiaosheng Chen, Yutong Feng, Gong Cheng

PDF

Open Access 1 Repo

TL;DR

mmRAG introduces a modular benchmark for evaluating retrieval-augmented generation systems across multiple data modalities, enabling detailed assessment of individual components beyond end-to-end output quality.

Contribution

It presents a novel, multi-modal benchmark that allows granular evaluation of RAG components over text, tables, and knowledge graphs, addressing limitations of existing evaluation methods.

Findings

01

Baseline RAG implementations evaluated on mmRAG.

02

Granular component evaluation is feasible with the benchmark.

03

Benchmark covers diverse data modalities and question types.

Abstract

Retrieval-Augmented Generation (RAG) has emerged as a powerful paradigm for enhancing the capabilities of large language models. However, existing RAG evaluation predominantly focuses on text retrieval and relies on opaque, end-to-end assessments of generated outputs. To address these limitations, we introduce mmRAG, a modular benchmark designed for evaluating multi-modal RAG systems. Our benchmark integrates queries from six diverse question-answering datasets spanning text, tables, and knowledge graphs, which we uniformly convert into retrievable documents. To enable direct, granular evaluation of individual RAG components -- such as the accuracy of retrieval and query routing -- beyond end-to-end generation quality, we follow standard information retrieval procedures to annotate document relevance and derive dataset relevance. We establish baseline performance by evaluating a wide…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nju-websoft/mmrag
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Advanced Graph Neural Networks

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Warmup With Linear Decay · Layer Normalization · Softmax · Attention Dropout · WordPiece · Residual Connection · Linear Layer · Byte Pair Encoding