Screen Parsing: Towards Reverse Engineering of UI Models from Screenshots
Jason Wu, Xiaoyi Zhang, Jeff Nichols, Jeffrey P. Bigham

TL;DR
This paper introduces screen parsing, a method to automatically identify UI elements and their relationships from screenshots, improving UI understanding for accessibility, automation, and design without extensive metadata.
Contribution
It presents a novel screen parsing approach with an effective training procedure, significantly outperforming existing systems in UI element prediction accuracy.
Findings
Outperforms current systems by up to 23% in accuracy
Enables applications like UI similarity search, accessibility, and code generation
Demonstrates practical benefits of automated UI understanding
Abstract
Automated understanding of user interfaces (UIs) from their pixels can improve accessibility, enable task automation, and facilitate interface design without relying on developers to comprehensively provide metadata. A first step is to infer what UI elements exist on a screen, but current approaches are limited in how they infer how those elements are semantically grouped into structured interface definitions. In this paper, we motivate the problem of screen parsing, the task of predicting UI elements and their relationships from a screenshot. We describe our implementation of screen parsing and provide an effective training procedure that optimizes its performance. In an evaluation comparing the accuracy of the generated output, we find that our implementation significantly outperforms current systems (up to 23%). Finally, we show three example applications that are facilitated by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
