Towards Automatic Parsing of Structured Visual Content through the Use of Synthetic Data
Lukas Scholch, Jonas Steinhauser, Maximilian Beichter, Constantin, Seibold, Kailun Yang, Merlin Kn\"able, Thorsten Schwarz, Alexander M\"adche,, and Rainer Stiefelhagen

TL;DR
This paper introduces a synthetic dataset and a model for automatically extracting graph representations from structured visual content images, aiding accessibility and automated knowledge extraction.
Contribution
The work presents the Synthetic SVC (SSVC) dataset with 12,000 images and ground truths, enabling training of models for interpreting structured visual content without extensive manual annotation.
Findings
Model shows transferability from synthetic to real data
Synthetic dataset effectively trains models for SVC interpretation
Baseline results establish a foundation for future research
Abstract
Structured Visual Content (SVC) such as graphs, flow charts, or the like are used by authors to illustrate various concepts. While such depictions allow the average reader to better understand the contents, images containing SVCs are typically not machine-readable. This, in turn, not only hinders automated knowledge aggregation, but also the perception of displayed in-formation for visually impaired people. In this work, we propose a synthetic dataset, containing SVCs in the form of images as well as ground truths. We show the usage of this dataset by an application that automatically extracts a graph representation from an SVC image. This is done by training a model via common supervised learning methods. As there currently exist no large-scale public datasets for the detailed analysis of SVC, we propose the Synthetic SVC (SSVC) dataset comprising 12,000 images with respective bounding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization
