ChartZero: Synthetic Priors Enable Zero Shot Chart Data Extraction
Md Touhidul Islam, Yasir Mahmud, Sujan Kumar Saha, Mark Tehranipoor, Farimah Farahmandi

TL;DR
ChartZero introduces a synthetic-data-based, zero-shot framework for robust, end-to-end chart data extraction, overcoming real-world annotation scarcity and improving accuracy in complex, diverse chart styles.
Contribution
It proposes a novel synthetic training approach with a GOI loss and VLM-guided legend matching, enabling zero-shot generalization without real-world annotations.
Findings
Outperforms existing methods on end-to-end chart reconstruction tasks.
Effectively handles complex backgrounds and fine visual details.
Introduces a new benchmark and metric for holistic chart data extraction.
Abstract
Automated data extraction from line charts remains fundamentally bottlenecked by extreme stylistic diversity and a severe scarcity of comprehensively annotated, real-world datasets. Current end-to-end pipelines depend heavily on costly manual annotations, crippling their ability to generalize across arbitrary aesthetics and grid layouts. Furthermore, existing models suffer from two critical failure modes during reconstruction. First, extracting thin, intersecting curves frequently causes structural fragmentation and the erasure of fine visual details, as standard architectures struggle against complex backgrounds. Second, semantic association is notoriously error-prone; current pipelines rely on rigid spatial heuristics that easily break down against the unpredictable legend placements of in-the-wild charts. Finally, measuring true progress is hindered by evaluation protocols that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
