MUD: Towards a Large-Scale and Noise-Filtered UI Dataset for Modern Style UI Modeling
Sidong Feng, Suyu Ma, Han Wang, David Kong, Chunyang Chen

TL;DR
This paper presents MUD, a large, high-quality UI dataset mined from Android apps using LLMs and human validation, to improve UI modeling tasks like element detection and retrieval.
Contribution
It introduces a novel LLM-based approach for automatic UI data mining and noise filtering, creating a large, modern UI dataset with human annotations for research use.
Findings
Successfully mined 18,000 human-annotated UIs from 3,300 apps
Demonstrated the dataset's effectiveness in element detection tasks
Showcased improvements in UI retrieval accuracy
Abstract
The importance of computational modeling of mobile user interfaces (UIs) is undeniable. However, these require a high-quality UI dataset. Existing datasets are often outdated, collected years ago, and are frequently noisy with mismatches in their visual representation. This presents challenges in modeling UI understanding in the wild. This paper introduces a novel approach to automatically mine UI data from Android apps, leveraging Large Language Models (LLMs) to mimic human-like exploration. To ensure dataset quality, we employ the best practices in UI noise filtering and incorporate human annotation as a final validation step. Our results demonstrate the effectiveness of LLMs-enhanced app exploration in mining more meaningful UIs, resulting in a large dataset MUD of 18k human-annotated UIs from 3.3k apps. We highlight the usefulness of MUD in two common UI modeling tasks: element…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Data Mining and Analysis · Multimedia Communication and Technology · Recommender Systems and Techniques
