Graph-Based Multimodal Contrastive Learning for Chart Question Answering

Yue Dai; Soyeon Caren Han; Wei Liu

arXiv:2501.04303·cs.CL·April 8, 2025

Graph-Based Multimodal Contrastive Learning for Chart Question Answering

Yue Dai, Soyeon Caren Han, Wei Liu

PDF

Open Access

TL;DR

This paper presents a novel multimodal graph framework with contrastive learning and tailored prompts to improve chart question answering, effectively modeling chart components and their relationships for better reasoning.

Contribution

It introduces a joint scene graph framework with contrastive learning and CoT prompts, advancing multimodal chart reasoning beyond prior methods.

Findings

01

Significant performance improvements on ChartQA, OpenCQA, and ChartX benchmarks.

02

Effective modeling of chart elements and their relationships.

03

Enhanced zero-shot reasoning with tailored Chain of Thought prompts.

Abstract

Chart question answering (ChartQA) is challenged by the heterogeneous composition of chart elements and the subtle data patterns they encode. This work introduces a novel joint multimodal scene graph framework that explicitly models the relationships among chart components and their underlying structures. The framework integrates both visual and textual graphs to capture structural and semantic characteristics, while a graph contrastive learning strategy aligns node representations across modalities enabling their seamless incorporation into a transformer decoder as soft prompts. Moreover, a set of tailored Chain of Thought (CoT) prompts is proposed to enhance multimodal large language models (MLLMs) in zero-s ot scenarios by mitigating hallucinations. Extensive evaluations on benchmarks including ChartQA, OpenCQA, and ChartX demonstrate significant performance improvements and validate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText and Document Classification Technologies · Natural Language Processing Techniques · Topic Modeling

MethodsContrastive Learning