PSPA-Bench: A Personalized Benchmark for Smartphone GUI Agent

Hongyi Nie; Xunyuan Liu; Yudong Bai; Yaqing Wang; Yang Liu; Quanming Yao; Zhen Wang

arXiv:2603.29318·cs.AI·April 1, 2026

PSPA-Bench: A Personalized Benchmark for Smartphone GUI Agent

Hongyi Nie, Xunyuan Liu, Yudong Bai, Yaqing Wang, Yang Liu, Quanming Yao, Zhen Wang

PDF

TL;DR

PSPA-Bench is a new benchmark for evaluating smartphone GUI agents' ability to deliver personalized assistance across diverse real-world scenarios, highlighting current limitations and guiding future improvements.

Contribution

It introduces PSPA-Bench with personalized instructions and a structure-aware evaluation method, providing a systematic way to assess and advance personalized GUI agents.

Findings

01

Current agents perform poorly in personalized settings.

02

Reasoning-oriented models outperform general LLMs.

03

Perception and memory are critical for personalization.

Abstract

Smartphone GUI agents execute tasks by operating directly on app interfaces, offering a path to broad capability without deep system integration. However, real-world smartphone use is highly personalized: users adopt diverse workflows and preferences, challenging agents to deliver customized assistance rather than generic solutions. Existing GUI agent benchmarks cannot adequately capture this personalization dimension due to sparse user-specific data and the lack of fine-grained evaluation metrics. To address this gap, we present PSPA-Bench, the benchmark dedicated to evaluating personalization in smartphone GUI agents. PSPA-Bench comprises over 12,855 personalized instructions aligned with real-world user behaviors across 10 representative daily-use scenarios and 22 mobile apps, and introduces a structure-aware process evaluation method that measures agents' personalized capabilities…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.