What Would GPT Click: Practical Effects of Human-AI Behavioral Misalignment and the Cost of Synthetic Participants in User Experience
Eduard Kuric, Peter Demcak, Matus Krajcovic

TL;DR
This study systematically evaluates GPT's ability to mimic human click behavior and reasoning in UX tests, revealing significant discrepancies and limitations that impact decision-making accuracy.
Contribution
It provides the first comprehensive analysis of the practical issues and distortions in synthetic user responses generated by GPT in real UX research contexts.
Findings
GPT significantly differs from real user data in 53% of tasks
Efforts to improve fidelity via personas and reasoning methods are ineffective
Synthetic responses exhibit distortions that undermine their usefulness for UX decision-making
Abstract
Synthetic participants represent a methodologically concerning concept that threatens the integrity of UX research. Findings from previous experiments specify how AI outputs are misaligned with the behaviors and thoughts of real humans in various ways. However, industry voices keep underestimating their severity, advocating for practical compromises where good-enough data does not need to be perfect, and all issues will be solved by future tuning. Our study tackles the lack of systematic understanding of the practical issues that arise with synthetic behavior and its use for steering decisions within real contexts. Within twelve diverse first click tests (n = 3431) obtained from real UX practice, we examine the ability of GPT to predict where humans click and how they reason about their behavior. Results (e.g., significantly different distribution from real data in 53% of tasks)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
