ChartReasoner: Code-Driven Modality Bridging for Long-Chain Reasoning in Chart Question Answering

Caijun Jia; Nan Xu; Jingxuan Wei; Qingli Wang; Lei Wang; Bihui Yu; Junnan Zhu

arXiv:2506.10116·cs.CL·June 13, 2025

ChartReasoner: Code-Driven Modality Bridging for Long-Chain Reasoning in Chart Question Answering

Caijun Jia, Nan Xu, Jingxuan Wei, Qingli Wang, Lei Wang, Bihui Yu, Junnan Zhu

PDF

Open Access 3 Reviews

TL;DR

ChartReasoner introduces a novel two-stage framework that converts charts into structured code for precise, interpretable reasoning, significantly improving visual reasoning in chart question answering tasks.

Contribution

The paper presents a new code-driven approach for visual reasoning over charts, including a high-fidelity chart-to-code conversion model and a scalable data synthesis pipeline for training.

Findings

01

Achieves performance comparable to state-of-the-art models with fewer parameters.

02

Effectively preserves chart details for accurate reasoning.

03

Demonstrates strong results on four public benchmarks.

Abstract

Recently, large language models have shown remarkable reasoning capabilities through long-chain reasoning before responding. However, how to extend this capability to visual reasoning tasks remains an open challenge. Existing multimodal reasoning approaches transfer such visual reasoning task into textual reasoning task via several image-to-text conversions, which often lose critical structural and semantic information embedded in visualizations, especially for tasks like chart question answering that require a large amount of visual details. To bridge this gap, we propose ChartReasoner, a code-driven novel two-stage framework designed to enable precise, interpretable reasoning over charts. We first train a high-fidelity model to convert diverse chart images into structured ECharts codes, preserving both layout and data semantics as lossless as possible. Then, we design a general chart…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 4Confidence 4

Strengths

Strengths - The methodology is presented clearly, with enough details to reproduce the dataset generations and model training. - The annotated reasoning traces in the ChartThink dataset could be useful for future work. - The method itself seems intuitive, with the motivation being clear of "bridging" the text-vision modality gap by using code as an intermediate modality.

Weaknesses

Weaknesses - It is unclear if translating charts to Apache ECharts code has any tangible performance improvement. There are many existing reasoning LLMs which can take image inputs, including the QvQ-preview model the authors include in their main results. Why not simply pass the chart image itself to these multimodal reasoning LLMs and ask them to generate the reasoning trace? - The gains from the ChartReasoner training are very minimal over Qwen2.5-VL 7B, which was the model used for finetunin

Reviewer 02Rating 4Confidence 3

Strengths

1. The motivation of this paper is clear, demonstrating the significance of chart reasoning. 2. The performance is good, demonstrating the effectiveness of the method.

Weaknesses

1. In the Chart2Code stage, how to ensure that the code could preserve all information of the charts that the texts could not do? In the quality filtering stage, will there conduct a comparison between the raw chart and the chart that the code corresponds to? 2. In Fig.12-15, if there lacks digital annotation in the charts, could the LLM generates accurate approximation for the data in the ECode? 3. In the ChartThink construction process, it seems the reasoning process has not been verified, onl

Reviewer 03Rating 6Confidence 3

Strengths

1. The paper is well-written and comprehensive, presenting a clear and detailed methodology. 2. The proposed Chart2Code the ChartThink dataset provide valuable resources for the community.

Weaknesses

1. The proposed approach is largely incremental and lacks substantial novelty, with the main contribution being the construction of datasets. 2. The performance gains are limited; for example, the method underperforms compared to Chart-R1[1] on ChartQA. 3. Unlike approaches based on Python code, which are more widely applicable, the method relies on ECharts templates. This limits its ability to handle complex or non-standard charts, as well as real-world data not generated with ECharts. 4. The i

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Data Visualization and Analytics