Spatial Priming Outperforms Semantic Prompting: A Grid-Based Approach to Improving LLM Accuracy on Chart Data Extraction
Andrei Lazarev, Dmitrii Sedov, Alexander Galkin

TL;DR
This paper demonstrates that overlaying a coordinate grid onto charts significantly improves data extraction accuracy in multimodal LLMs, outperforming semantic prompting methods.
Contribution
The study introduces a simple spatial priming technique using grid overlays that outperforms semantic methods for chart data extraction with current multimodal models.
Findings
Grid-based spatial priming reduces extraction error from 25.5% to 19.5%.
Semantic priming methods did not yield significant improvements.
Spatial context provision is more effective than semantic guidance for this task.
Abstract
The automated extraction of data from scientific charts is a critical task for large-scale literature analysis. While multimodal Large Language Models (LLMs) show promise, their accuracy on non-standardized charts remains a challenge. This raises a key research question: what is the most effective strategy to improve model performance (high-level semantic priming) or low-level spatial priming? This paper presents a comparative investigation into these two distinct strategies. We describe our exploratory experiments with semantic methods, such as a two-stage metadata-first framework and Chain-of-Thought, which failed to produce a statistically significant improvement. In contrast, we present a simple but highly effective spatial priming method: overlaying a coordinate grid onto the chart image before analysis. Our quantitative experiment on a synthetic dataset demonstrates that this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
