From Dead Pixels to Editable Slides: Infographic Reconstruction into Native Google Slides via Vision-Language Region Understanding
Leonardo Gonzalez

TL;DR
This paper presents Images2Slides, an API pipeline that converts static infographics into editable Google Slides by extracting regions with vision-language models and reconstructing elements, enabling easy updates and reuse.
Contribution
It introduces a model-agnostic, region-based approach for infographic reconstruction into editable slides, supporting multiple vision-language backends and demonstrating high element recovery rates.
Findings
Achieves 98.9% element recovery rate on benchmark
Text transcription error rate of 3.3%
Layout fidelity with IoU of 0.364 for text regions
Abstract
Infographics are widely used to communicate information with a combination of text, icons, and data visualizations, but once exported as images their content is locked into pixels, making updates, localization, and reuse expensive. We describe \textsc{Images2Slides}, an API-based pipeline that converts a static infographic (PNG/JPG) into a native, editable Google Slides slide by extracting a region-level specification with a vision-language model (VLM), mapping pixel geometry into slide coordinates, and recreating elements using the Google Slides batch update API. The system is model-agnostic and supports multiple VLM backends via a common JSON region schema and deterministic postprocessing. On a controlled benchmark of 29 programmatically generated infographic slides with known ground-truth regions, \textsc{Images2Slides} achieves an overall element recovery rate of …
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Interactive and Immersive Displays · Computer Graphics and Visualization Techniques
