Do MLLMs Capture How Interfaces Guide User Behavior? A Benchmark for Multimodal UI/UX Design Understanding
Jaehyun Jeon, Min Soo Kim, Jang Han Yoon, Sumin Shim, Yejin Choi, Hanbin Kim, Dae Hyun Kim, Youngjae Yu

TL;DR
This paper introduces WiserUI-Bench, a benchmark to evaluate how well Multimodal Large Language Models understand the influence of UI/UX design on user behavior, revealing current model limitations.
Contribution
The paper presents WiserUI-Bench, a new benchmark with real-world UI data to assess MLLMs' understanding of UI/UX impact on user actions and explanations.
Findings
MLLMs show limited understanding of UI/UX influence on behavior.
Models struggle to predict the more effective UI in A/B tests.
Expert-curated interpretations aid post-hoc explanations.
Abstract
User interface (UI) design goes beyond visuals to shape user experience (UX), underscoring the shift toward UI/UX as a unified concept. While recent studies have explored UI evaluation using Multimodal Large Language Models (MLLMs), they largely focus on surface-level features, overlooking how design choices influence user behavior at scale. To fill this gap, we introduce WiserUI-Bench, a novel benchmark for multimodal understanding of how UI/UX design affects user behavior, built on 300 real-world UI image pairs from industry A/B tests, with empirically validated winners that induced more user actions. For future design progress in practice, post-hoc understanding of why such winners succeed with mass users is also required; we support this via expert-curated key interpretations for each instance. Experiments across multiple MLLMs on WiserUI-Bench for two main tasks, (1) predicting the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Visualization and Analytics · Persona Design and Applications · Interactive and Immersive Displays
MethodsFocus
