UICrit: Enhancing Automated Design Evaluation with a UICritique Dataset
Peitong Duan, Chin-yi Chen, Gang Li, Bjoern Hartmann, Yang Li

TL;DR
This paper introduces UICrit, a dataset of 3,059 UI critiques aimed at improving automated UI evaluation by enhancing LLM performance through targeted feedback, with potential applications in training reward models and fine-tuning multi-modal LLMs.
Contribution
The paper presents a new dataset of UI critiques and demonstrates how it significantly improves LLM-based UI evaluation performance.
Findings
55% performance improvement in LLM-generated UI feedback
Dataset contains 3,059 critiques for 983 mobile UIs
Potential for training reward models and fine-tuning multi-modal LLMs
Abstract
Automated UI evaluation can be beneficial for the design process; for example, to compare different UI designs, or conduct automated heuristic evaluation. LLM-based UI evaluation, in particular, holds the promise of generalizability to a wide variety of UI types and evaluation tasks. However, current LLM-based techniques do not yet match the performance of human evaluators. We hypothesize that automatic evaluation can be improved by collecting a targeted UI feedback dataset and then using this dataset to enhance the performance of general-purpose LLMs. We present a targeted dataset of 3,059 design critiques and quality ratings for 983 mobile UIs, collected from seven experienced designers. We carried out an in-depth analysis to characterize the dataset's features. We then applied this dataset to achieve a 55% performance gain in LLM-generated UI feedback via various few-shot and visual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Reliability and Analysis Research · Embedded Systems Design Techniques · Adversarial Robustness in Machine Learning
