CharTool: Tool-Integrated Visual Reasoning for Chart Understanding
Situo Zhang, Yifan Zhang, Zichen Zhu, Da Ma, Lei Pan, Danyang Zhang, Zihan Zhao, Lu Chen, Kai Yu

TL;DR
CharTool enhances multimodal large language models' ability to understand and reason about charts by integrating external tools and training on diverse synthesized and real-world chart data.
Contribution
It introduces DuoChart data pipeline and CharTool framework, enabling tool-augmented reasoning for improved chart understanding in MLLMs.
Findings
CharTool-7B outperforms base model by +8.0% on CharXiv reasoning.
CharTool achieves +9.78% on ChartQAPro.
Method generalizes well to out-of-domain visual math benchmarks.
Abstract
Charts are ubiquitous in scientific and financial literature for presenting structured data. However, chart reasoning remains challenging for multimodal large language models (MLLMs) due to the lack of high-quality training data, as well as the need for fine-grained visual grounding and precise numerical computation. To address these challenges, we first propose DuoChart, a scalable dual-source data pipeline that combines synthesized charts with real-world charts to construct diverse, high-quality chart training data. We then introduce CharTool, which equips MLLMs with external tools, including image cropping for localized visual perception and code-based computation for accurate numerical reasoning. Through agentic reinforcement learning on DuoChart, CharTool learns tool-integrated reasoning grounded in chart content. Extensive experiments on six chart benchmarks show that our method…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
