mmRAG: A Modular Benchmark for Retrieval-Augmented Generation over Text, Tables, and Knowledge Graphs
Chuan Xu, Qiaosheng Chen, Yutong Feng, Gong Cheng

TL;DR
mmRAG introduces a modular benchmark for evaluating retrieval-augmented generation systems across multiple data modalities, enabling detailed assessment of individual components beyond end-to-end output quality.
Contribution
It presents a novel, multi-modal benchmark that allows granular evaluation of RAG components over text, tables, and knowledge graphs, addressing limitations of existing evaluation methods.
Findings
Baseline RAG implementations evaluated on mmRAG.
Granular component evaluation is feasible with the benchmark.
Benchmark covers diverse data modalities and question types.
Abstract
Retrieval-Augmented Generation (RAG) has emerged as a powerful paradigm for enhancing the capabilities of large language models. However, existing RAG evaluation predominantly focuses on text retrieval and relies on opaque, end-to-end assessments of generated outputs. To address these limitations, we introduce mmRAG, a modular benchmark designed for evaluating multi-modal RAG systems. Our benchmark integrates queries from six diverse question-answering datasets spanning text, tables, and knowledge graphs, which we uniformly convert into retrievable documents. To enable direct, granular evaluation of individual RAG components -- such as the accuracy of retrieval and query routing -- beyond end-to-end generation quality, we follow standard information retrieval procedures to annotate document relevance and derive dataset relevance. We establish baseline performance by evaluating a wide…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Advanced Graph Neural Networks
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Warmup With Linear Decay · Layer Normalization · Softmax · Attention Dropout · WordPiece · Residual Connection · Linear Layer · Byte Pair Encoding
