Why Do LLMs Struggle in Strategic Play? Broken Links Between Observations, Beliefs, and Actions
Jan Sobotka, Mustafa O. Karabag, Ufuk Topcu

TL;DR
This paper investigates why large language models struggle in strategic decision-making tasks involving incomplete information, revealing internal belief and action gaps that affect their performance.
Contribution
It uncovers two fundamental gaps—observation-belief and belief-action—in LLMs' internal decision processes during strategic play, supported by experiments with multiple models.
Findings
LLMs' internal beliefs are more accurate than their verbal reports but are brittle.
Belief accuracy degrades with multi-hop reasoning and drifts from Bayesian coherence.
Implicit belief-to-action conversion is weaker than externalized beliefs, affecting game payoffs.
Abstract
Large language models (LLMs) are increasingly tasked with strategic decision-making under incomplete information, such as in negotiation and policymaking. While LLMs can excel at many such tasks, they also fail in ways that are poorly understood. We shed light on these failures by uncovering two fundamental gaps in the internal mechanisms underlying the decision-making of LLMs in incomplete-information games, supported by experiments with open-weight models Llama 3.1, Qwen3, and gpt-oss. First, an observation-belief gap: LLMs encode internal beliefs about latent game states that are substantially more accurate than their own verbal reports, yet these beliefs are brittle. In particular, the belief accuracy degrades with multi-hop reasoning, exhibits primacy and recency biases, and drifts away from Bayesian coherence over extended interactions. Second, a belief-action gap: The implicit…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
