Learning More from Less: Exploiting Counterfactuals for Data-Efficient Chart Understanding
Jianzhu Bao, Haozhen Zhang, Kuicai Dong, Bozhi Wu, Sarthak Ketanbhai Modi, Zi Pong Lim, Yon Shin Teo, Wenya Wang

TL;DR
This paper introduces ChartCF, a data-efficient training framework that enhances chart understanding by leveraging counterfactual data synthesis, similarity-based sample selection, and multimodal preference optimization, achieving strong results with less data.
Contribution
ChartCF is a novel training approach that improves counterfactual sensitivity in vision-language models for chart understanding, reducing data requirements and increasing efficiency.
Findings
Achieves comparable or better performance than existing models on five benchmarks.
Uses significantly less training data than traditional supervised fine-tuning.
Effective in capturing fine-grained visual and semantic differences in charts.
Abstract
Vision-Language Models (VLMs) have demonstrated remarkable progress in chart understanding, largely driven by supervised fine-tuning (SFT) on increasingly large synthetic datasets. However, scaling SFT data alone is inefficient and overlooks a key property of charts: charts are programmatically generated visual artifacts, where small, code-controlled visual changes can induce drastic shifts in semantics and correct answers. Learning this counterfactual sensitivity requires VLMs to discriminate fine-grained visual differences, yet standard SFT treats training instances independently and provides limited supervision to enforce this behavior. To address this, we introduce ChartCF, a data-efficient training framework designed to enhance counterfactual sensitivity. ChartCF consists of: (1) a counterfactual data synthesis pipeline via code modification, (2) a chart similarity-based data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
