TL;DR
GUIGuard-Bench introduces a novel benchmark dataset with annotated GUI trajectories to evaluate privacy-preserving strategies in GUI agents, highlighting current models' strengths and limitations in privacy recognition and task utility.
Contribution
It provides the first trajectory-based privacy benchmark for GUI agents, supporting multiple evaluation tasks and revealing key challenges in privacy detection and protection.
Findings
Models can detect private information presence but struggle with localization and categorization.
Closed-source models like Claude Sonnet 4.6 maintain task semantics after privacy protection.
Privacy recognition remains a critical bottleneck for practical GUI agents.
Abstract
As GUI agents increasingly rely on screenshots to perceive and operate digital environments, they may inadvertently expose sensitive information such as identities, accounts, locations, and behavioral traces. While existing benchmarks primarily focus on task completion, grounding, or defenses against third-party attacks, current visual privacy datasets remain largely restricted to static natural images, limiting their ability to capture the contextual dependence and task relevance of privacy risks in GUI task trajectories. To bridge this gap, we introduce \textbf{GUIGuard-Bench}, a first-step benchmark for studying privacy-preserving GUI agents in trajectory-based GUI workflows. GUIGuard-Bench contains 241 real GUI-agent trajectories with 4,080 screenshots across Android and PC environments. Each screenshot is annotated at the region level with privacy bounding boxes, semantic privacy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
