ChartPoint: Guiding MLLMs with Grounding Reflection for Chart Reasoning

Zhengzhuo Xu; SiNan Du; Yiyan Qi; SiwenLu; Chengjin Xu; Chun Yuan; Jian Guo

arXiv:2512.00305·cs.AI·December 2, 2025

ChartPoint: Guiding MLLMs with Grounding Reflection for Chart Reasoning

Zhengzhuo Xu, SiNan Du, Yiyan Qi, SiwenLu, Chengjin Xu, Chun Yuan, Jian Guo

PDF

Open Access

TL;DR

This paper introduces ChartPoint, a method that enhances multimodal large language models' ability to reason about charts by integrating visual grounding through bounding boxes and re-rendering, addressing the limitations of OCR-based content extraction.

Contribution

It proposes PointCoT, a novel approach combining reflective reasoning with visual grounding, and creates a large dataset for training models to improve chart comprehension and reasoning.

Findings

01

Models outperform state-of-the-art on chart benchmarks.

02

Introduction of a new dataset with step-by-step reasoning annotations.

03

Enhanced reasoning accuracy in chart comprehension tasks.

Abstract

Multimodal Large Language Models (MLLMs) have emerged as powerful tools for chart comprehension. However, they heavily rely on extracted content via OCR, which leads to numerical hallucinations when chart textual annotations are sparse. While existing methods focus on scaling instructions, they fail to address the fundamental challenge, i.e., reasoning with visual perception. In this paper, we identify a critical observation: MLLMs exhibit weak grounding in chart elements and proportional relationships, as evidenced by their inability to localize key positions to match their reasoning. To bridge this gap, we propose PointCoT, which integrates reflective interaction into chain-of-thought reasoning in charts. By prompting MLLMs to generate bounding boxes and re-render charts based on location annotations, we establish connections between textual reasoning steps and visual grounding…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Data Visualization and Analytics · Multimodal Machine Learning Applications