Advancing Multimodal Large Language Models in Chart Question Answering   with Visualization-Referenced Instruction Tuning

Xingchen Zeng; Haichuan Lin; Yilin Ye; Wei Zeng

arXiv:2407.20174·cs.CV·August 13, 2024·1 cites

Advancing Multimodal Large Language Models in Chart Question Answering with Visualization-Referenced Instruction Tuning

Xingchen Zeng, Haichuan Lin, Yilin Ye, Wei Zeng

PDF

Open Access 1 Repo

TL;DR

This paper introduces a visualization-referenced instruction tuning method for multimodal large language models to improve chart question answering, addressing data imbalance and chart-specific adaptation issues, resulting in superior performance on benchmarks.

Contribution

The paper proposes a novel data filtering and augmentation pipeline combined with a mixture-of-resolution training strategy to enhance MLLMs for chart question answering tasks.

Findings

01

Model outperforms state-of-the-art on benchmarks.

02

Enriched data improves fine-grained visual recognition.

03

Fewer training examples needed for high performance.

Abstract

Emerging multimodal large language models (MLLMs) exhibit great potential for chart question answering (CQA). Recent efforts primarily focus on scaling up training datasets (i.e., charts, data tables, and question-answer (QA) pairs) through data collection and synthesis. However, our empirical study on existing MLLMs and CQA datasets reveals notable gaps. First, current data collection and synthesis focus on data volume and lack consideration of fine-grained visual encodings and QA tasks, resulting in unbalanced data distribution divergent from practical CQA scenarios. Second, existing work follows the training recipe of the base MLLMs initially designed for natural images, under-exploring the adaptation to unique chart characteristics, such as rich text elements. To fill the gap, we propose a visualization-referenced instruction tuning approach to guide the training dataset enhancement…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zengxingchen/chartqa-mllm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques

MethodsBalanced Selection · Focus · ALIGN