DPBench: Large Language Models Struggle with Simultaneous Coordination
Najmul Hasan, Prashanth BusiReddyGari

TL;DR
DPBench is a new benchmark testing large language models' ability to coordinate in multi-agent scenarios, revealing they struggle with simultaneous decision-making and deadlock, highlighting the need for external coordination mechanisms.
Contribution
Introduces DPBench, a benchmark based on the Dining Philosophers problem, to evaluate LLM coordination under resource contention and multiple conditions.
Findings
LLMs coordinate well sequentially but fail simultaneously.
Deadlock rates exceed 95% in some conditions.
Communication does not improve and may worsen deadlock issues.
Abstract
Large language models are increasingly deployed in multi-agent systems, yet we lack benchmarks that test whether they can coordinate under resource contention. We introduce DPBench, a benchmark based on the Dining Philosophers problem that evaluates LLM coordination across eight conditions that vary decision timing, group size, and communication. Our experiments with GPT-5.2, Claude Opus 4.5, and Grok 4.1 reveal a striking asymmetry: LLMs coordinate effectively in sequential settings but fail when decisions must be made simultaneously, with deadlock rates exceeding 95\% under some conditions. We trace this failure to convergent reasoning, where agents independently arrive at identical strategies that, when executed simultaneously, guarantee deadlock. Contrary to expectations, enabling communication does not resolve this problem and can even increase deadlock rates. Our findings suggest…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Big Data and Digital Economy · Language and cultural evolution
