MUD: Towards a Large-Scale and Noise-Filtered UI Dataset for Modern   Style UI Modeling

Sidong Feng; Suyu Ma; Han Wang; David Kong; Chunyang Chen

arXiv:2405.07090·cs.HC·May 14, 2024

MUD: Towards a Large-Scale and Noise-Filtered UI Dataset for Modern Style UI Modeling

Sidong Feng, Suyu Ma, Han Wang, David Kong, Chunyang Chen

PDF

Open Access 1 Repo

TL;DR

This paper presents MUD, a large, high-quality UI dataset mined from Android apps using LLMs and human validation, to improve UI modeling tasks like element detection and retrieval.

Contribution

It introduces a novel LLM-based approach for automatic UI data mining and noise filtering, creating a large, modern UI dataset with human annotations for research use.

Findings

01

Successfully mined 18,000 human-annotated UIs from 3,300 apps

02

Demonstrated the dataset's effectiveness in element detection tasks

03

Showcased improvements in UI retrieval accuracy

Abstract

The importance of computational modeling of mobile user interfaces (UIs) is undeniable. However, these require a high-quality UI dataset. Existing datasets are often outdated, collected years ago, and are frequently noisy with mismatches in their visual representation. This presents challenges in modeling UI understanding in the wild. This paper introduces a novel approach to automatically mine UI data from Android apps, leveraging Large Language Models (LLMs) to mimic human-like exploration. To ensure dataset quality, we employ the best practices in UI noise filtering and incorporate human annotation as a final validation step. Our results demonstrate the effectiveness of LLMs-enhanced app exploration in mining more meaningful UIs, resulting in a large dataset MUD of 18k human-annotated UIs from 3.3k apps. We highlight the usefulness of MUD in two common UI modeling tasks: element…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sidongfeng/MUD
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsWeb Data Mining and Analysis · Multimedia Communication and Technology · Recommender Systems and Techniques