Cooperative Profiles Predict Multi-Agent LLM Team Performance in AI for Science Workflows
Shivani Kumar, Adarsh Bharathwaj, David Jurgens

TL;DR
This paper demonstrates that cooperative behavior profiles derived from behavioral economics games can predict the performance of multi-agent LLM teams in scientific workflows, offering a diagnostic tool for assessing cooperation.
Contribution
It introduces a benchmarking approach linking game-based cooperative profiles of LLMs to their effectiveness in collaborative scientific tasks.
Findings
Game-derived cooperative profiles predict downstream scientific performance.
Models investing in team production outperform greedy strategies.
Cooperative disposition is a measurable property independent of general ability.
Abstract
Multi-agent systems built from teams of large language models (LLMs) are increasingly deployed for collaborative scientific reasoning and problem-solving. These systems require agents to coordinate under shared constraints, such as GPUs or credit balances, where cooperative behavior matters. Behavioral economics provides a rich toolkit of games that isolate distinct cooperation mechanisms, yet it remains unknown whether a model's behavior in these stylized settings predicts its performance in realistic collaborative tasks. Here, we benchmark 35 open-weight LLMs across six behavioral economics games and show that game-derived cooperative profiles robustly predict downstream performance in AI-for-Science tasks, where teams of LLM agents collaboratively analyze data, build models, and produce scientific reports under shared budget constraints. Models that effectively coordinate games and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
