Screen2Words: Automatic Mobile UI Summarization with Multimodal Learning

Bryan Wang; Gang Li; Xin Zhou; Zhourong Chen; Tovi Grossman; Yang Li

arXiv:2108.03353·cs.HC·August 10, 2021·1 cites

Screen2Words: Automatic Mobile UI Summarization with Multimodal Learning

Bryan Wang, Gang Li, Xin Zhou, Zhourong Chen, Tovi Grossman, Yang Li

PDF

Open Access 2 Repos 10 Models 2 Datasets

TL;DR

Screen2Words is a multimodal learning approach that automatically generates concise language descriptions of mobile UI screens, leveraging a large dataset and deep models to improve understanding and communication of UI content.

Contribution

The paper introduces a novel multimodal learning method for mobile UI summarization, supported by a large-scale annotated dataset and comprehensive model evaluations.

Findings

01

High-quality summaries generated by the models

02

Large-scale dataset with over 112k annotations

03

Effective multimodal learning approach for UI understanding

Abstract

Mobile User Interface Summarization generates succinct language descriptions of mobile screens for conveying important contents and functionalities of the screen, which can be useful for many language-based application scenarios. We present Screen2Words, a novel screen summarization approach that automatically encapsulates essential information of a UI screen into a coherent language phrase. Summarizing mobile screens requires a holistic understanding of the multi-modal data of mobile UIs, including text, image, structures as well as UI semantics, motivating our multi-modal learning approach. We collected and analyzed a large-scale screen summarization dataset annotated by human workers. Our dataset contains more than 112k language summarization across $\sim$ 22k unique UI screens. We then experimented with a set of deep models with different configurations. Our evaluation of these…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · AI in Service Interactions · Topic Modeling