TL;DR
Screen2Vec is a self-supervised embedding technique that captures comprehensive GUI screen and component semantics, including textual, visual, and contextual features, without manual annotation, aiding in modeling interactions and design mining.
Contribution
It introduces a novel self-supervised method inspired by Word2Vec that encodes GUI features using interaction traces, reducing manual effort and enhancing semantic representation.
Findings
Effective in representing screen similarity and user tasks
Supports composability of GUI components
Demonstrates utility in downstream GUI analysis tasks
Abstract
Representing the semantics of GUI screens and components is crucial to data-driven computational methods for modeling user-GUI interactions and mining GUI designs. Existing GUI semantic representations are limited to encoding either the textual content, the visual design and layout patterns, or the app contexts. Many representation techniques also require significant manual data annotation efforts. This paper presents Screen2Vec, a new self-supervised technique for generating representations in embedding vectors of GUI screens and components that encode all of the above GUI features without requiring manual annotation using the context of user interaction traces. Screen2Vec is inspired by the word embedding method Word2Vec, but uses a new two-layer pipeline informed by the structure of GUIs and interaction traces and incorporates screen- and app-specific metadata. Through several sample…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
