Turing Test on Screen: A Benchmark for Mobile GUI Agent Humanization
Jiachen Zhu, Lingyu Yang, Rong Shan, Congmin Zheng, Zeyu Zheng, Weiwen Liu, Yong Yu, Weinan Zhang, Jianghao Lin

TL;DR
This paper introduces a benchmark and methods for creating mobile GUI agents that mimic human behavior convincingly to evade detection, balancing utility and human-like imitation.
Contribution
It formalizes the Turing Test on Screen as a MinMax optimization, establishes the Agent Humanization Benchmark, and proposes techniques for high imitability without losing performance.
Findings
Vanilla LMM-based agents are easily detectable due to unnatural kinematics.
Agents can achieve high imitability without sacrificing task performance.
The proposed methods improve agent humanization in adversarial environments.
Abstract
The rise of autonomous GUI agents has triggered adversarial countermeasures from digital platforms, yet existing research prioritizes utility and robustness over the critical dimension of anti-detection. We argue that for agents to survive in human-centric ecosystems, they must evolve Humanization capabilities. We introduce the ``Turing Test on Screen,'' formally modeling the interaction as a MinMax optimization problem between a detector and an agent aiming to minimize behavioral divergence. We then collect a new high-fidelity dataset of mobile touch dynamics, and conduct our analysis that vanilla LMM-based agents are easily detectable due to unnatural kinematics. Consequently, we establish the Agent Humanization Benchmark (AHB) and detection metrics to quantify the trade-off between imitability and utility. Finally, we propose methods ranging from heuristic noise to data-driven…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
