ConVerse: Benchmarking Contextual Safety in Agent-to-Agent Conversations
Amr Gomaa, Ahmed Salem, Sahar Abdelnabi

TL;DR
ConVerse is a comprehensive benchmark designed to evaluate privacy and security vulnerabilities in multi-agent conversations across various domains, highlighting persistent risks in current language models.
Contribution
It introduces a novel, multi-domain, multi-turn benchmark for assessing safety in agent-to-agent interactions, emphasizing privacy and security challenges.
Findings
Privacy attacks succeed in up to 88% of cases.
Security breaches occur in up to 60% of interactions.
Stronger models tend to leak more information.
Abstract
As language models evolve into autonomous agents that act and communicate on behalf of users, ensuring safety in multi-agent ecosystems becomes a central challenge. Interactions between personal assistants and external service providers expose a core tension between utility and protection: effective collaboration requires information sharing, yet every exchange creates new attack surfaces. We introduce ConVerse, a dynamic benchmark for evaluating privacy and security risks in agent-agent interactions. ConVerse spans three practical domains (travel, real estate, insurance) with 12 user personas and over 864 contextually grounded attacks (611 privacy, 253 security). Unlike prior single-agent settings, it models autonomous, multi-turn agent-to-agent conversations where malicious requests are embedded within plausible discourse. Privacy is tested through a three-tier taxonomy assessing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsSocial Robot Interaction and HRI · AI in Service Interactions · Persona Design and Applications
