Chart-R1: Chain-of-Thought Supervision and Reinforcement for Advanced Chart Reasoner
Lei Chen, Xuanle Zhao, Zhixiong Zeng, Jing Huang, Yufeng Zhong, Lin Ma

TL;DR
Chart-R1 is a novel vision-language model designed for advanced chart reasoning, combining chain-of-thought supervision and reinforcement learning to improve understanding of complex, multi-faceted chart data.
Contribution
The paper introduces Chart-R1, which employs a two-stage training strategy with programmatic data synthesis, chain-of-thought supervision, and reinforcement fine-tuning for superior chart reasoning.
Findings
Outperforms existing chart reasoning models on benchmarks
Achieves competitive results with large-scale models
Demonstrates robustness across diverse chart types
Abstract
Chart reasoning presents unique challenges due to its inherent complexity -- requiring precise numerical comprehension, multi-level visual understanding, and logical inference across interconnected data elements. Existing vision-language models often struggle with such reasoning tasks, particularly when handling multi-subchart scenarios and numerical sensitivity. To address these challenges, we introduce Chart-R1, a chart-domain vision-language model that leverages reinforcement fine-tuning for advanced chart reasoning. We first propose a programmatic data synthesis approach to generate high-quality step-by-step reasoning data with verifiable answer formats, covering diverse chart types and complexity levels. Our two-stage training strategy includes: (1) Chart-COT, which decomposes complex reasoning into interpretable subtasks through chain-of-thought supervision, and (2) Chart-RFT,…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
S1. The paper presents an adaptation of R1-style reinforcement learning to the chart reasoning. The overall implementation is coherent, and the two-stage structure (Chart-COT + Chart-RFT) is technically sound. S2. Experiments cover multiple public and in-domain benchmarks, with consistent improvements over open-source and chart-specific baselines. S3. The data synthesis pipeline and ChartRQA benchmark could be useful for future work, provided they are released.
W1. The main motivation is that R1-style reinforcement learning has not yet been applied to chart reasoning, which is not a strong research question in itself. W2. The method closely follows existing R1-style frameworks with only minor domain-specific adjustments. W3. While the paper repeatedly emphasizes complex chart reasoning, it never clearly defines what constitutes “complex.” The distinction between complex reasoning and simpler chart understanding tasks remains ambiguous.
1. **Novel and Robust Data Generation:** The programmatic data synthesis approach—generating the plotting code first and then using the code to formulate complex questions, reasoning paths, and answers—effectively overcomes the limitations of existing methods, which often rely on distilling reasoning from weaker models or are constrained by the accuracy of chart-to-code parsers. This methodology enables the creation of a high-quality, diverse dataset for complex reasoning. 2. **Effective Two-St
1. More VLM Reasoning Models should be included: MMR1, VL-Cogito, VL-Rethinker, OpenVLThinker, R1-VL, and so on. 2. Missing Quantitative Comparison to Contemporary Reasoning VLMs: The paper successfully positions Chart-R1 as an advancement over general VLMs. However, the paper's results tables do not include a direct, quantitative head-to-head comparison of Chart-R1 against these contemporary VLM Reasoning models on the complex ChartRQA benchmark. Including these comparisons is necessary to de
* The SFT dataset enhances Qwen2.5-vl performance across various chart understanding benchmarks, including ChartQA, Charxiv , and ChartQAPRO. * The well-designed RL approach, incorporating appropriate reward functions, further elevates the model's performance, achieving state-of-the-art results on a diverse range of benchmarks.
* The data tables used to render the chart images are mainly sourced from arXiv which limits the diversity of the dataset (e.g., topics). Furthermore, generating QA pairs exclusively from the code, without incorporating images, may lead to a dataset with fewer visual questions and a greater emphasis on the data itself. * In lines 142-156, the authors claim that their data synthesis method works better than existing approaches that either augments existing datasets with CoT or generates QA pairs
**High-Fidelity Dataset Generation via Code (ChartRQA).** ChartRQA is generated from executable plotting code rather than static images, ensuring perfect alignment between data and visuals, greater diversity (24 chart types), and complex multi-chart reasoning tasks. **Soft Accuracy Reward Design.** The reinforcement stage introduces *soft numeric* and *edit-distance* rewards, which better capture partial correctness and stabilize training, yielding superior performance across benchmarks. **St
**Limited base-model diversity.** Most experiments are on a single backbone (Qwen2.5-VL-7B). Broader validation across architectures/sizes would strengthen claims about generality. **Incremental novelty of the training scheme.** The two-stage CoT + RL pipeline follows recent R1 style work. The main novelty appears to be dataset construction and chart-specific reward shaping, rather than a fundamentally new algorithm. **Small-scale analyses.** Some ablations (e.g., reward components, stage inte
1. The task of complex chart reasoning is important and has significant practical value for users in real-world data analysis scenarios. 2. The open-source nature of the dataset can positively impact future research in the chart reasoning community. 3. The paper is clearly written and easy to follow. 4. The authors provide code and implementation details, ensuring reproducibility. 5. The dataset generation pipeline is carefully designed to cover diverse chart types and reasoning complexities usi
While the authors state that they incorporate human verification and real arXiv charts, the dataset construction still heavily relies on LLMs. Such reliance raises concerns that synthetic data may introduce a domain gap from real-world chart data and that LLM-inherent biases could influence the data distribution and question formulation. Furthermore, it is unclear whether the proposed dataset is overly tailored to the Chart-R1 framework, which could limit its generality when used with other reas
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Machine Learning and Data Classification · Explainable Artificial Intelligence (XAI)
