CRAFT: Grounded Multi-Agent Coordination Under Partial Information

Abhijnan Nath; Hannah VanderHoeven; Nikhil Krishnaswamy

arXiv:2603.25268·cs.CL·April 29, 2026

CRAFT: Grounded Multi-Agent Coordination Under Partial Information

Abhijnan Nath, Hannah VanderHoeven, Nikhil Krishnaswamy

PDF

1 Repo 1 Datasets

TL;DR

CRAFT introduces a benchmark for multi-agent coordination with language models under partial info, revealing current models' limitations in pragmatic communication and collaboration.

Contribution

It formalizes a multi-sender pragmatic communication problem, provides a diagnostic framework, and evaluates diverse models revealing coordination challenges.

Findings

01

Stronger reasoning does not always improve coordination.

02

Smaller open-weight models can outperform larger frontier models.

03

Improved individual communication does not ensure successful collaboration.

Abstract

We introduce CRAFT, a multi-agent benchmark for evaluating pragmatic communication in large language models under strict partial information. In this setting, multiple agents with complementary but incomplete views must coordinate through natural language to construct a shared 3D structure that no single agent can fully observe. We formalize this problem as a multi-sender Bounded Pragmatic Speaker problem and provide a diagnostic framework that decomposes failures into spatial grounding, belief modeling and pragmatic communication errors, including a taxonomy of behavioral failure profiles in both frontier and open-weight models. Across a diverse set of models, including 8 open-weight and 7 frontier including reasoning models, we find that stronger reasoning ability does not reliably translate to better coordination: smaller open-weight models often match or outperform frontier systems,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

csu-signal/CRAFT
github

Datasets

Abhijnan/craft-benchmark-lean
dataset· 421 dl
421 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.