Mechanism Design Is Not Enough: Prosocial Agents for Cooperative AI
Xuanqiang Angelo Huang, Charlie Tharas, Samuele Marro, Van Q. Truong, Bernhard Sch\"olkopf, Emanuele La Malfa, and Zhijing Jin

TL;DR
This paper demonstrates that mechanism design alone cannot ensure cooperative AI behavior and advocates for prosocial agents that inherently consider others' welfare to improve social outcomes.
Contribution
It formally proves the limitations of mechanism design in achieving social welfare and shows that prosocial agents can bridge this gap for safer, more cooperative AI.
Findings
Prosocial agents can achieve socially superior outcomes in multi-agent environments.
Mechanism design alone cannot eliminate welfare loss due to unobservable contingencies.
Prosociality improves performance in social dilemmas with large language model agents.
Abstract
Ensuring that AI agents behave safely and beneficially when interacting with other parties has emerged as one of the central challenges of modern AI safety. While mechanism design, as the theory of designing rules to align individual and collective objectives, can incentivize cooperative behavior, it is still an open question whether it alone is sufficient to maximize LLM agents' social welfare. This work proves that the answer is negative: drawing from incomplete contract theory, we formally show that when contracts cannot distinguish all relevant future contingencies, there is a strictly positive welfare loss that no realistic mechanism can eliminate. We show that prosocial agents, who weigh others' welfare alongside their own, can close this gap and achieve outcomes that are socially superior and individually beneficial. Experimentally, we show that in multi-agent resource-allocation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
