Chart-CoCa: Self-Improving Chart Understanding of Vision LMs via Code-Driven Synthesis and Candidate-Conditioned Answering

Gongyao Jiang; Qiong Luo

arXiv:2508.11975·cs.AI·August 19, 2025

Chart-CoCa: Self-Improving Chart Understanding of Vision LMs via Code-Driven Synthesis and Candidate-Conditioned Answering

Gongyao Jiang, Qiong Luo

PDF

Open Access

TL;DR

This paper introduces Chart-CoCa, a self-improving framework for chart understanding in vision language models that uses code-driven synthetic data generation and candidate-conditioned answering to enhance accuracy without human labels.

Contribution

It presents a novel pipeline combining code-based chart synthesis and candidate-conditioned answering, enabling self-improvement without human-labeled data.

Findings

01

Achieves up to 15.50 points accuracy improvement.

02

Demonstrates effectiveness of synthetic data generation for chart understanding.

03

Shows self-improving paradigm enhances VLM performance.

Abstract

Vision Language Models (VLMs) often struggle with chart understanding tasks, particularly in accurate chart description and complex reasoning. Synthetic data generation is a promising solution, while usually facing the challenge of noise labels. To address this challenge, we first introduce a chart synthesis pipeline that generates aligned chart-question-answer triplets through code generation and execution, ensuring the reliability of synthetic data without human intervention. Furthermore, inspired by test-time scaling that increases inference budget and thereby improves performance, we design a candidate-conditioned answering process. The VLM first generates multiple responses per query, and then synthesizes the final answer by contextualizing these candidates. Experiments demonstrate significant improvements, with up to 15.50 points accuracy gain over the initial VLM, in a fully…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAI-based Problem Solving and Planning · Semantic Web and Ontologies