Learning to Infer User Interface Attributes from Images
Philippe Schlattner, Pavol Bielik, Martin Vechev

TL;DR
This paper presents a neural approach to automatically infer user interface attributes from images, enabling developers to convert visual designs into implementation-ready specifications with high accuracy.
Contribution
It introduces a method combining synthetic data generation, neural attribute prediction, and imitation learning to accurately infer UI attributes from images.
Findings
Achieved 92.5% accuracy on real-world Android button data.
Demonstrated effective attribute inference using synthetic training data.
Improved pixel-level accuracy through imitation learning.
Abstract
We explore a new domain of learning to infer user interface attributes that helps developers automate the process of user interface implementation. Concretely, given an input image created by a designer, we learn to infer its implementation which when rendered, looks visually the same as the input image. To achieve this, we take a black box rendering engine and a set of attributes it supports (e.g., colors, border radius, shadow or text properties), use it to generate a suitable synthetic training dataset, and then train specialized neural models to predict each of the attribute values. To improve pixel-level accuracy, we additionally use imitation learning to train a neural policy that refines the predicted attribute values by learning to compute the similarity of the original and rendered images in their attribute space, rather than based on the difference of pixel values. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications
