CooperBench: Why Coding Agents Cannot be Your Teammates Yet

Arpandeep Khatua; Hao Zhu; Peter Tran; Arya Prabhudesai; Frederic Sadrieh; Johann K. Lieberwirth; Xinkai Yu; Yicheng Fu; Michael J. Ryan; Jiaxin Pei; Diyi Yang

arXiv:2601.13295·cs.LG·January 27, 2026

CooperBench: Why Coding Agents Cannot be Your Teammates Yet

Arpandeep Khatua, Hao Zhu, Peter Tran, Arya Prabhudesai, Frederic Sadrieh, Johann K. Lieberwirth, Xinkai Yu, Yicheng Fu, Michael J. Ryan, Jiaxin Pei, Diyi Yang

PDF

Open Access 1 Datasets

TL;DR

CooperBench evaluates current AI coding agents' ability to collaborate effectively, revealing significant coordination issues and emphasizing the need for social intelligence development in AI teamwork.

Contribution

Introduces CooperBench, a comprehensive benchmark for collaborative coding, highlighting coordination challenges and proposing a focus shift towards social intelligence in AI agents.

Findings

01

Agents achieve 30% lower success rates when collaborating.

02

Communication issues include vagueness and misalignment.

03

Emergent behaviors like role division observed.

Abstract

Resolving team conflicts requires not only task-specific competence, but also social intelligence to find common ground and build consensus. As AI agents increasingly collaborate on complex work, they must develop coordination capabilities to function as effective teammates. Yet we hypothesize that current agents lack these capabilities. To test this, we introduce CooperBench, a benchmark of over 600 collaborative coding tasks across 12 libraries in 4 programming languages. Each task assigns two agents different features that can be implemented independently but may conflict without proper coordination. Tasks are grounded in real open-source repositories with expert-written tests. Evaluating state-of-the-art coding agents, we observe the curse of coordination: agents achieve on average 30% lower success rates when working together compared to performing both tasks individually. This…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

CodeConflict/cooperbench-dataset
dataset· 1.7k dl
1.7k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMobile Crowdsensing and Crowdsourcing · Multi-Agent Systems and Negotiation · Team Dynamics and Performance