WebPII: Benchmarking Visual PII Detection for Computer-Use Agents
Nathan Zhao

TL;DR
WebPII introduces a comprehensive benchmark dataset for detecting personally identifiable information in web screenshots, enabling improved privacy-preserving AI applications with a new detection model that outperforms baselines.
Contribution
The paper presents WebPII, a large synthetic dataset with detailed annotations for PII detection in web images, and introduces WebRedact, a model demonstrating significant accuracy improvements.
Findings
WebRedact more than doubles baseline accuracy (0.753 vs 0.357 mAP@50).
WebPII dataset covers diverse web interfaces with detailed PII taxonomy.
Detection achieves real-time CPU latency of 20ms.
Abstract
Computer use agents create new privacy risks: training data collected from real websites inevitably contains sensitive information, and cloud-hosted inference exposes user screenshots. Detecting personally identifiable information in web screenshots is critical for privacy-preserving deployment, but no public benchmark exists for this task. We introduce WebPII, a fine-grained synthetic benchmark of 44,865 annotated e-commerce UI images designed with three key properties: extended PII taxonomy including transaction-level identifiers that enable reidentification, anticipatory detection for partially-filled forms where users are actively entering data, and scalable generation through VLM-based UI reproduction. Experiments validate that these design choices improve layout-invariant detection across diverse interfaces and generalization to held-out page types. We train WebRedact to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy, Security, and Data Protection · Advanced Malware Detection Techniques · Web Data Mining and Analysis
