Benchmarking Multimodal RAG through a Chart-based Document   Question-Answering Generation Framework

Yuming Yang; Jiang Zhong; Li Jin; Jingwang Huang; Jingpeng Gao; Qing; Liu; Yang Bai; Jingyuan Zhang; Rui Jiang; Kaiwen Wei

arXiv:2502.14864·cs.AI·February 21, 2025

Benchmarking Multimodal RAG through a Chart-based Document Question-Answering Generation Framework

Yuming Yang, Jiang Zhong, Li Jin, Jingwang Huang, Jingpeng Gao, Qing, Liu, Yang Bai, Jingyuan Zhang, Rui Jiang, Kaiwen Wei

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper introduces a new benchmark and framework for evaluating multimodal retrieval-augmented generation specifically on chart-based documents, highlighting current limitations and providing a comprehensive evaluation dataset.

Contribution

The work presents CHARGE, a semi-automatic framework for generating chart-based QA data, and introduces Chart-MRAG Bench, a new benchmark for complex visual-text reasoning tasks.

Findings

01

Unified retrieval methods struggle with chart data

02

State-of-the-art models achieve only 58.19% correctness

03

Models show bias towards text over visual information

Abstract

Multimodal Retrieval-Augmented Generation (MRAG) enhances reasoning capabilities by integrating external knowledge. However, existing benchmarks primarily focus on simple image-text interactions, overlooking complex visual formats like charts that are prevalent in real-world applications. In this work, we introduce a novel task, Chart-based MRAG, to address this limitation. To semi-automatically generate high-quality evaluation samples, we propose CHARt-based document question-answering GEneration (CHARGE), a framework that produces evaluation data through structured keypoint extraction, crossmodal verification, and keypoint-based generation. By combining CHARGE with expert validation, we construct Chart-MRAG Bench, a comprehensive benchmark for chart-based MRAG evaluation, featuring 4,738 question-answering pairs across 8 domains from real-world documents. Our evaluation reveals three…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nomothings/charge
noneOfficial

Datasets

ymyang/Chart-MRAG
dataset· 308 dl
308 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems

MethodsFocus