Multimodal Icon Annotation For Mobile Applications

Xiaoxue Zang; Ying Xu; Jindong Chen

arXiv:2107.04452·cs.CV·July 12, 2021

Multimodal Icon Annotation For Mobile Applications

Xiaoxue Zang, Ying Xu, Jindong Chen

PDF

Open Access

TL;DR

This paper introduces a multi-modal deep learning approach that combines pixel and view hierarchy features to improve icon annotation in mobile UIs, outperforming existing methods.

Contribution

The study presents a novel multi-modal method that leverages both pixel data and view hierarchy information for UI element annotation, along with a new annotated dataset.

Findings

01

Outperforms baseline object classification models

02

Effective combination of view hierarchy and pixel features

03

Provides a new high-quality annotated UI dataset

Abstract

Annotating user interfaces (UIs) that involves localization and classification of meaningful UI elements on a screen is a critical step for many mobile applications such as screen readers and voice control of devices. Annotating object icons, such as menu, search, and arrow backward, is especially challenging due to the lack of explicit labels on screens, their similarity to pictures, and their diverse shapes. Existing studies either use view hierarchy or pixel based methods to tackle the task. Pixel based approaches are more popular as view hierarchy features on mobile platforms are often incomplete or inaccurate, however it leaves out instructional information in the view hierarchy such as resource-ids or content descriptions. We propose a novel deep learning based multi-modal approach that combines the benefits of both pixel and view hierarchy features as well as leverages the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsWeb Data Mining and Analysis · Gaze Tracking and Assistive Technology · Advanced Image and Video Retrieval Techniques