GenPlot: Increasing the Scale and Diversity of Chart Derendering Data
Brendan Artley

TL;DR
GenPlot is a synthetic data generator that significantly expands the scale and diversity of chart derendering datasets, enhancing the training of visual language models for understanding various plot types.
Contribution
It introduces a scalable plot generator that creates billions of synthetic charts, addressing dataset limitations in chart derendering tasks.
Findings
Generated billions of diverse synthetic plots
Improved performance on chart derendering benchmarks
Enhanced model robustness across plot types
Abstract
Vertical bars, horizontal bars, dot, scatter, and line plots provide a diverse set of visualizations to represent data. To understand these plots, one must be able to recognize textual components, locate data points in a plot, and process diverse visual contexts to extract information. In recent works such as Pix2Struct, Matcha, and Deplot, OCR-free chart-to-text translation has achieved state-of-the-art results on visual language tasks. These results outline the importance of chart-derendering as a pre-training objective, yet existing datasets provide a fixed set of training examples. In this paper, we propose GenPlot; a plot generator that can generate billions of additional plots for chart-derendering using synthetic data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Natural Language Processing Techniques · Video Analysis and Summarization
