MAGPIE: A dataset for Multi-AGent contextual PrIvacy Evaluation
Gurusha Juneja, Alon Albalak, Wenyue Hua, William Yang Wang

TL;DR
This paper introduces MAGPIE, a comprehensive benchmark with real-world scenarios to evaluate whether current LLM-based agents understand and preserve contextual privacy during multi-agent collaboration.
Contribution
The paper presents MAGPIE, a new high-stakes benchmark with 158 scenarios across 15 domains, to assess LLMs' understanding of privacy and their ability to collaborate without privacy violations.
Findings
Current models often misclassify private data as shareable (25.2% and 43.6%).
Models disclose private info in 50-60% of multi-turn conversations despite privacy instructions.
Multi-agent systems fail to complete tasks in 71% of scenarios due to privacy issues.
Abstract
The proliferation of LLM-based agents has led to increasing deployment of inter-agent collaboration for tasks like scheduling, negotiation, resource allocation etc. In such systems, privacy is critical, as agents often access proprietary tools and domain-specific databases requiring strict confidentiality. This paper examines whether LLM-based agents demonstrate an understanding of contextual privacy. And, if instructed, do these systems preserve inference time user privacy in non-adversarial multi-turn conversation. Existing benchmarks to evaluate contextual privacy in LLM-agents primarily assess single-turn, low-complexity tasks where private information can be easily excluded. We first present a benchmark - MAGPIE comprising 158 real-life high-stakes scenarios across 15 domains. These scenarios are designed such that complete exclusion of private data impedes task completion yet…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Privacy, Security, and Data Protection
