PSPA-Bench: A Personalized Benchmark for Smartphone GUI Agent
Hongyi Nie, Xunyuan Liu, Yudong Bai, Yaqing Wang, Yang Liu, Quanming Yao, Zhen Wang

TL;DR
PSPA-Bench is a new benchmark for evaluating smartphone GUI agents' ability to deliver personalized assistance across diverse real-world scenarios, highlighting current limitations and guiding future improvements.
Contribution
It introduces PSPA-Bench with personalized instructions and a structure-aware evaluation method, providing a systematic way to assess and advance personalized GUI agents.
Findings
Current agents perform poorly in personalized settings.
Reasoning-oriented models outperform general LLMs.
Perception and memory are critical for personalization.
Abstract
Smartphone GUI agents execute tasks by operating directly on app interfaces, offering a path to broad capability without deep system integration. However, real-world smartphone use is highly personalized: users adopt diverse workflows and preferences, challenging agents to deliver customized assistance rather than generic solutions. Existing GUI agent benchmarks cannot adequately capture this personalization dimension due to sparse user-specific data and the lack of fine-grained evaluation metrics. To address this gap, we present PSPA-Bench, the benchmark dedicated to evaluating personalization in smartphone GUI agents. PSPA-Bench comprises over 12,855 personalized instructions aligned with real-world user behaviors across 10 representative daily-use scenarios and 22 mobile apps, and introduces a structure-aware process evaluation method that measures agents' personalized capabilities…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
