DreamStruct: Understanding Slides and User Interfaces via Synthetic Data   Generation

Yi-Hao Peng; Faria Huq; Yue Jiang; Jason Wu; Amanda Xin Yue Li,; Jeffrey Bigham; Amy Pavel

arXiv:2410.00201·cs.CV·October 2, 2024

DreamStruct: Understanding Slides and User Interfaces via Synthetic Data Generation

Yi-Hao Peng, Faria Huq, Yue Jiang, Jason Wu, Amanda Xin Yue Li,, Jeffrey Bigham, Amy Pavel

PDF

Open Access 1 Repo

TL;DR

DreamStruct introduces a synthetic data generation method for structured visuals like slides and UIs, enabling effective machine understanding with minimal manual annotation, thereby improving recognition, description, and classification tasks.

Contribution

It presents a novel code-based synthetic data generation approach that reduces manual labeling and enhances model performance on structured visual understanding tasks.

Findings

01

Improved recognition accuracy for visual elements.

02

Enhanced content description capabilities.

03

Better classification of visual content types.

Abstract

Enabling machines to understand structured visuals like slides and user interfaces is essential for making them accessible to people with disabilities. However, achieving such understanding computationally has required manual data collection and annotation, which is time-consuming and labor-intensive. To overcome this challenge, we present a method to generate synthetic, structured visuals with target labels using code generation. Our method allows people to create datasets with built-in labels and train models with a small number of human-annotated examples. We demonstrate performance improvements in three tasks for understanding slides and UIs: recognizing visual elements, describing visual content, and classifying visual content types.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yihaop/dreamstruct
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Visualization and Analytics