SOTOPIA-TOM: Evaluating Information Management in Multi-Agent Interaction with Theory of Mind
Yashwanth YS, Ruichen Wang, Shihua Zeng, Xuhui Zhou, Koichi Onoue, Vasudha Varadarajan, Maarten Sap

TL;DR
This paper introduces SOTOPIA-TOM, a comprehensive benchmark to evaluate LLM agents' ability to manage information asymmetry and privacy in multi-agent interactions, revealing current limitations and potential improvements.
Contribution
It presents a new multi-dimensional evaluation framework and environment for assessing privacy-aware, theory-of-mind capabilities in multi-agent LLM interactions.
Findings
Even the largest models score only 62% on the INFOMGMT metric.
ToM-based interventions improve privacy and coordination, reducing privacy violations significantly.
Current LLM agents show persistent deficiencies in information seeking and privacy-aware decision-making.
Abstract
As LLM-based agents are increasingly interacting in multi-party settings, they need to properly handle information asymmetry, i.e., knowing when and to whom to disclose information is appropriate. Yet, existing benchmarks fail to measure this ability in realistic multi-party settings. Thus, we introduce SOTOPIA-TOM, a multi-dimensional benchmarking framework to evaluate LLM agents' ability to successfully navigate information asymmetric and privacy sensitive multi-party interactions. We create an interaction environment which enables both public (broadcast) and private (direct message) communication, and craft 160 human-reviewed scenarios across eight industry sectors, each involving 3 to 5 agents with partitioned private knowledge and channel-dependent sharing policies. To measure interaction abilities, we create a multi-dimensional evaluation framework to assess how well agents share…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
