Benchmarking Multimodal RAG through a Chart-based Document Question-Answering Generation Framework
Yuming Yang, Jiang Zhong, Li Jin, Jingwang Huang, Jingpeng Gao, Qing, Liu, Yang Bai, Jingyuan Zhang, Rui Jiang, Kaiwen Wei

TL;DR
This paper introduces a new benchmark and framework for evaluating multimodal retrieval-augmented generation specifically on chart-based documents, highlighting current limitations and providing a comprehensive evaluation dataset.
Contribution
The work presents CHARGE, a semi-automatic framework for generating chart-based QA data, and introduces Chart-MRAG Bench, a new benchmark for complex visual-text reasoning tasks.
Findings
Unified retrieval methods struggle with chart data
State-of-the-art models achieve only 58.19% correctness
Models show bias towards text over visual information
Abstract
Multimodal Retrieval-Augmented Generation (MRAG) enhances reasoning capabilities by integrating external knowledge. However, existing benchmarks primarily focus on simple image-text interactions, overlooking complex visual formats like charts that are prevalent in real-world applications. In this work, we introduce a novel task, Chart-based MRAG, to address this limitation. To semi-automatically generate high-quality evaluation samples, we propose CHARt-based document question-answering GEneration (CHARGE), a framework that produces evaluation data through structured keypoint extraction, crossmodal verification, and keypoint-based generation. By combining CHARGE with expert validation, we construct Chart-MRAG Bench, a comprehensive benchmark for chart-based MRAG evaluation, featuring 4,738 question-answering pairs across 8 domains from real-world documents. Our evaluation reveals three…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems
MethodsFocus
