Don't Make the LLM Read the Graph: Make the Graph Think
Yuqi Sun, Tianqin Meng, George Liu, Yashraj Panwar, Lakshya Chaudhry, Munasib Ilham, Aman Chadha

TL;DR
This study explores how explicit belief graphs influence large language models' performance in multi-agent reasoning within the game Hanabi, revealing architecture-dependent benefits, model-specific failures, and the importance of combined belief components.
Contribution
It demonstrates that belief graphs can enhance LLM multi-agent reasoning depending on architecture and integration method, and identifies model-specific failures and optimal graph complexity.
Findings
Belief graphs improve performance mainly when used as action gatekeepers for strong models.
Models exhibit 'Planner Defiance,' overriding correct recommendations at partial competence.
Combining belief components yields better results than individual parts.
Abstract
We investigate whether explicit belief graphs improve LLM performance in cooperative multi-agent reasoning. Through 3,000+ controlled trials across four LLM families in the cooperative card game Hanabi, we establish four findings. First, integration architecture determines whether belief graphs provide value: as prompt context, graphs are decorative for strong models and beneficial only for weak models on 2nd-order Theory of Mind (80% vs 10%, p<0.0001, OR=36.0); when graphs gate action selection through ranked shortlists, they become structurally essential even for strong models (100% vs 20% on 2nd-order ToM, p<0.001). Second, we identify "Planner Defiance," a model-family-specific failure where LLMs override correct planner recommendations at partial competence (90% override, replicated N=20); Gemini models show near-zero defiance while Llama 70B shows 90%, and models distinguish…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
