Chart Specification: Structural Representations for Incentivizing VLM Reasoning in Chart-to-Code Generation

Minggui He; Mingchen Dai; Jian Zhang; Yilun Liu; Shimin Tao; Pufan Zeng; Osamu Yoshie; Yuya Ieiri

arXiv:2602.10880·cs.CV·February 12, 2026

Chart Specification: Structural Representations for Incentivizing VLM Reasoning in Chart-to-Code Generation

Minggui He, Mingchen Dai, Jian Zhang, Yilun Liu, Shimin Tao, Pufan Zeng, Osamu Yoshie, Yuya Ieiri

PDF

Open Access

TL;DR

This paper introduces Chart Specification, a structured intermediate representation and reinforcement learning approach that significantly improves the accuracy and fidelity of chart-to-code generation by vision-language models, especially with limited training data.

Contribution

It proposes a novel structured representation and a reward mechanism for better structural fidelity in chart-to-code tasks, surpassing prior methods with less data.

Findings

01

Outperforms previous approaches on three benchmarks.

02

Achieves up to 61.7% improvement with only 3K training samples.

03

Establishes new state-of-the-art results with 4K samples.

Abstract

Vision-Language Models (VLMs) have shown promise in generating plotting code from chart images, yet achieving structural fidelity remains challenging. Existing approaches largely rely on supervised fine-tuning, encouraging surface-level token imitation rather than faithful modeling of underlying chart structure, which often leads to hallucinated or semantically inconsistent outputs. We propose Chart Specification, a structured intermediate representation that shifts training from text imitation to semantically grounded supervision. Chart Specification filters syntactic noise to construct a structurally balanced training set and supports a Spec-Align Reward that provides fine-grained, verifiable feedback on structural correctness, enabling reinforcement learning to enforce consistent plotting logic. Experiments on three public benchmarks show that our method consistently outperforms…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis