Screen Parsing: Towards Reverse Engineering of UI Models from   Screenshots

Jason Wu; Xiaoyi Zhang; Jeff Nichols; Jeffrey P. Bigham

arXiv:2109.08763·cs.HC·September 21, 2021

Screen Parsing: Towards Reverse Engineering of UI Models from Screenshots

Jason Wu, Xiaoyi Zhang, Jeff Nichols, Jeffrey P. Bigham

PDF

TL;DR

This paper introduces screen parsing, a method to automatically identify UI elements and their relationships from screenshots, improving UI understanding for accessibility, automation, and design without extensive metadata.

Contribution

It presents a novel screen parsing approach with an effective training procedure, significantly outperforming existing systems in UI element prediction accuracy.

Findings

01

Outperforms current systems by up to 23% in accuracy

02

Enables applications like UI similarity search, accessibility, and code generation

03

Demonstrates practical benefits of automated UI understanding

Abstract

Automated understanding of user interfaces (UIs) from their pixels can improve accessibility, enable task automation, and facilitate interface design without relying on developers to comprehensively provide metadata. A first step is to infer what UI elements exist on a screen, but current approaches are limited in how they infer how those elements are semantically grouped into structured interface definitions. In this paper, we motivate the problem of screen parsing, the task of predicting UI elements and their relationships from a screenshot. We describe our implementation of screen parsing and provide an effective training procedure that optimizes its performance. In an evaluation comparing the accuracy of the generated output, we find that our implementation significantly outperforms current systems (up to 23%). Finally, we show three example applications that are facilitated by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.