CooperBench: Why Coding Agents Cannot be Your Teammates Yet
Arpandeep Khatua, Hao Zhu, Peter Tran, Arya Prabhudesai, Frederic Sadrieh, Johann K. Lieberwirth, Xinkai Yu, Yicheng Fu, Michael J. Ryan, Jiaxin Pei, Diyi Yang

TL;DR
CooperBench evaluates current AI coding agents' ability to collaborate effectively, revealing significant coordination issues and emphasizing the need for social intelligence development in AI teamwork.
Contribution
Introduces CooperBench, a comprehensive benchmark for collaborative coding, highlighting coordination challenges and proposing a focus shift towards social intelligence in AI agents.
Findings
Agents achieve 30% lower success rates when collaborating.
Communication issues include vagueness and misalignment.
Emergent behaviors like role division observed.
Abstract
Resolving team conflicts requires not only task-specific competence, but also social intelligence to find common ground and build consensus. As AI agents increasingly collaborate on complex work, they must develop coordination capabilities to function as effective teammates. Yet we hypothesize that current agents lack these capabilities. To test this, we introduce CooperBench, a benchmark of over 600 collaborative coding tasks across 12 libraries in 4 programming languages. Each task assigns two agents different features that can be implemented independently but may conflict without proper coordination. Tasks are grounded in real open-source repositories with expert-written tests. Evaluating state-of-the-art coding agents, we observe the curse of coordination: agents achieve on average 30% lower success rates when working together compared to performing both tasks individually. This…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMobile Crowdsensing and Crowdsourcing · Multi-Agent Systems and Negotiation · Team Dynamics and Performance
