Graph-Based Multimodal Contrastive Learning for Chart Question Answering
Yue Dai, Soyeon Caren Han, Wei Liu

TL;DR
This paper presents a novel multimodal graph framework with contrastive learning and tailored prompts to improve chart question answering, effectively modeling chart components and their relationships for better reasoning.
Contribution
It introduces a joint scene graph framework with contrastive learning and CoT prompts, advancing multimodal chart reasoning beyond prior methods.
Findings
Significant performance improvements on ChartQA, OpenCQA, and ChartX benchmarks.
Effective modeling of chart elements and their relationships.
Enhanced zero-shot reasoning with tailored Chain of Thought prompts.
Abstract
Chart question answering (ChartQA) is challenged by the heterogeneous composition of chart elements and the subtle data patterns they encode. This work introduces a novel joint multimodal scene graph framework that explicitly models the relationships among chart components and their underlying structures. The framework integrates both visual and textual graphs to capture structural and semantic characteristics, while a graph contrastive learning strategy aligns node representations across modalities enabling their seamless incorporation into a transformer decoder as soft prompts. Moreover, a set of tailored Chain of Thought (CoT) prompts is proposed to enhance multimodal large language models (MLLMs) in zero-s ot scenarios by mitigating hallucinations. Extensive evaluations on benchmarks including ChartQA, OpenCQA, and ChartX demonstrate significant performance improvements and validate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Natural Language Processing Techniques · Topic Modeling
MethodsContrastive Learning
