Towards Automatic Parsing of Structured Visual Content through the Use   of Synthetic Data

Lukas Scholch; Jonas Steinhauser; Maximilian Beichter; Constantin; Seibold; Kailun Yang; Merlin Kn\"able; Thorsten Schwarz; Alexander M\"adche,; and Rainer Stiefelhagen

arXiv:2204.14136·cs.CV·May 2, 2022

Towards Automatic Parsing of Structured Visual Content through the Use of Synthetic Data

Lukas Scholch, Jonas Steinhauser, Maximilian Beichter, Constantin, Seibold, Kailun Yang, Merlin Kn\"able, Thorsten Schwarz, Alexander M\"adche,, and Rainer Stiefelhagen

PDF

Open Access

TL;DR

This paper introduces a synthetic dataset and a model for automatically extracting graph representations from structured visual content images, aiding accessibility and automated knowledge extraction.

Contribution

The work presents the Synthetic SVC (SSVC) dataset with 12,000 images and ground truths, enabling training of models for interpreting structured visual content without extensive manual annotation.

Findings

01

Model shows transferability from synthetic to real data

02

Synthetic dataset effectively trains models for SVC interpretation

03

Baseline results establish a foundation for future research

Abstract

Structured Visual Content (SVC) such as graphs, flow charts, or the like are used by authors to illustrate various concepts. While such depictions allow the average reader to better understand the contents, images containing SVCs are typically not machine-readable. This, in turn, not only hinders automated knowledge aggregation, but also the perception of displayed in-formation for visually impaired people. In this work, we propose a synthetic dataset, containing SVCs in the form of images as well as ground truths. We show the usage of this dataset by an application that automatically extracts a graph representation from an SVC image. This is done by training a model via common supervised learning methods. As there currently exist no large-scale public datasets for the detailed analysis of SVC, we propose the Synthetic SVC (SSVC) dataset comprising 12,000 images with respective bounding…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization