Synthetic Cognitive Walkthrough: Aligning Large Language Model Performance with Human Cognitive Walkthrough
Ruican Zhong, David W. McDonald, Gary Hsieh

TL;DR
This paper investigates using advanced large language models like GPT-4 to automate and enhance usability testing through cognitive walkthroughs, comparing their performance with human participants and exploring their potential to scale UI evaluation.
Contribution
It demonstrates that LLMs can simulate human-like usability walkthroughs, identify potential failure points, and be fine-tuned to align more closely with human behavior, offering a scalable alternative to traditional methods.
Findings
LLMs can navigate interfaces and provide rationales similar to humans.
LLMs achieve higher task completion rates than humans in CW.
Additional prompting aligns LLM predictions with human-identified failure points.
Abstract
Conducting usability testing like cognitive walkthrough (CW) can be costly. Recent developments in large language models (LLMs), with visual reasoning and UI navigation capabilities, present opportunities to automate CW. We explored whether LLMs (GPT-4 and Gemini-2.5-pro) can simulate human behavior in CW by comparing their walkthroughs with human participants. While LLMs could navigate interfaces and provide reasonable rationales, their behavior differed from humans. LLM-prompted CW achieved higher task completion rates than humans and followed more optimal navigation paths, while identifying fewer potential failure points. However, follow-up studies demonstrated that with additional prompting, LLMs can predict human-identified failure points, aligning their performance with human participants. Our work highlights that while LLMs may not replicate human behaviors exactly, they can be…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Visualization and Analytics · Usability and User Interface Design · AI in Service Interactions
