POLAR-Bench: A Diagnostic Benchmark for Privacy-Utility Trade-offs in LLM Agents
Qiaoyuan Zheng, Yiqu Yang, Qi Gao, Imanol Schlag

TL;DR
POLAR-Bench is a diagnostic benchmark designed to evaluate privacy-utility trade-offs in LLM agents, revealing how well models protect private data under adversarial probing across multiple domains.
Contribution
The paper introduces POLAR-Bench, a new benchmark for assessing privacy and utility in LLMs with adversarial testing across diverse domains.
Findings
Frontier models withhold over 99% of protected attributes.
Smaller open-weight models leak over 50% of protected data.
POLAR-Bench localizes where models fail to follow privacy policies.
Abstract
LLM agents increasingly have access to private user data and act on the user's behalf when interacting with third-party systems. The user defines what may and must not be shared, and the agent must robustly follow that intent even when third-party systems behave adversarially. We introduce POLAR-Bench (Policy-aware adversarial Benchmark), in which a trusted model with a privacy policy and a task converses with a third-party model that adversarially probes for both task-relevant and protected attributes. Across 10 domains and 7,852 samples, we score privacy and utility by deterministic set-membership and vary privacy policy dimension and attack strategy along two orthogonal axes, producing a 5 times 5 diagnostic surface per model. Our results reveal a sharp split: current frontier models withhold over 99% of protected attributes, while smaller open-weight models in the 1--30B range, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
