CRAFT: Grounded Multi-Agent Coordination Under Partial Information
Abhijnan Nath, Hannah VanderHoeven, Nikhil Krishnaswamy

TL;DR
CRAFT introduces a benchmark for multi-agent coordination with language models under partial info, revealing current models' limitations in pragmatic communication and collaboration.
Contribution
It formalizes a multi-sender pragmatic communication problem, provides a diagnostic framework, and evaluates diverse models revealing coordination challenges.
Findings
Stronger reasoning does not always improve coordination.
Smaller open-weight models can outperform larger frontier models.
Improved individual communication does not ensure successful collaboration.
Abstract
We introduce CRAFT, a multi-agent benchmark for evaluating pragmatic communication in large language models under strict partial information. In this setting, multiple agents with complementary but incomplete views must coordinate through natural language to construct a shared 3D structure that no single agent can fully observe. We formalize this problem as a multi-sender Bounded Pragmatic Speaker problem and provide a diagnostic framework that decomposes failures into spatial grounding, belief modeling and pragmatic communication errors, including a taxonomy of behavioral failure profiles in both frontier and open-weight models. Across a diverse set of models, including 8 open-weight and 7 frontier including reasoning models, we find that stronger reasoning ability does not reliably translate to better coordination: smaller open-weight models often match or outperform frontier systems,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
