PlotPick: AI-powered batch extraction of numerical data from scientific figures

Tommy Carstensen

arXiv:2605.06021·cs.CV·May 8, 2026

PlotPick: AI-powered batch extraction of numerical data from scientific figures

Tommy Carstensen

PDF

1 Repo

TL;DR

PlotPick is an open-source tool leveraging vision-language models to efficiently extract structured data from scientific figures, outperforming specialized models on key benchmarks.

Contribution

It introduces a novel application of VLMs for batch extraction of data from figures, with superior performance on established benchmarks.

Findings

01

VLMs outperform DePlot on ChartX and PlotQA benchmarks.

02

VLMs achieve 88-96% recall on ChartX, surpassing DePlot's 71%.

03

VLMs excel on chart types absent from training data, like box plots.

Abstract

Systematic reviews and meta-analyses frequently require numerical data that authors report only as figures, yet manual digitisation is slow and does not scale. We present PlotPick, an open-source tool that uses vision-language models (VLMs) to batch-extract structured tabular data from scientific figures. We evaluate six VLMs from three providers on two established chart-to-table benchmarks (ChartX and PlotQA) and compare against the dedicated chart-to-table model DePlot. All six VLMs outperform DePlot on both benchmarks. On ChartX (restricted to bar charts, line charts, box plots, and histograms; n=300), VLMs achieve 88-96% recall versus 71% for DePlot. On PlotQA (n=529), VLMs achieve 86-99% RMSF1 versus 94% for DePlot. The gap is largest on chart types absent from the dedicated models' training data: on box plots, DePlot achieves 24% RMSF1 while VLMs achieve 83-97%. PlotPick is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://plotpick.streamlit.app
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.