ChartAgent: A Chart Understanding Framework with Tool Integrated Reasoning
Boran Wang, Xinming Wang, Yi Chen, Xiang Li, Jian Xu, Jing Yuan, Chenglin Liu

TL;DR
ChartAgent is a modular framework that enhances automated chart understanding by decomposing analysis into observable steps and integrating various tools, improving robustness and transparency in visual data interpretation.
Contribution
The paper introduces ChartAgent, a novel framework that employs Tool-Integrated Reasoning and modular tools for systematic, transparent, and robust chart analysis beyond reliance on textual annotations.
Findings
Significantly improves robustness under sparse annotation conditions
Provides transparent and verifiable intermediate outputs
Achieves systematic visual parsing across diverse chart types
Abstract
With their high information density and intuitive readability, charts have become the de facto medium for data analysis and communication across disciplines. Recent multimodal large language models (MLLMs) have made notable progress in automated chart understanding, yet they remain heavily dependent on explicit textual annotations and the performance degrades markedly when key numerals are absent. To address this limitation, we introduce ChartAgent, a chart understanding framework grounded in Tool-Integrated Reasoning (TIR). Inspired by human cognition, ChartAgent decomposes complex chart analysis into a sequence of observable, replayable steps. Supporting this architecture is an extensible, modular tool library comprising more than a dozen core tools, such as keyelement detection, instance segmentation, and optical character recognition (OCR), which the agent dynamically orchestrates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Topic Modeling · Multimodal Machine Learning Applications
