Observer, Not Player: Simulating Theory of Mind in LLMs through Game Observation
Jerry Wang, Ting Yiu Liu

TL;DR
This paper introduces an interactive framework to evaluate whether large language models can exhibit mind-like reasoning by observing and identifying strategies in the game of Rock-Paper-Scissors, emphasizing interpretability and systematic assessment.
Contribution
It develops a novel benchmark and evaluation metrics for assessing LLMs' understanding of strategic behavior through game observation, focusing on interpretability and stability of strategy identification.
Findings
LLMs can identify strategies with moderate accuracy.
The framework reveals strengths and limitations in LLM reasoning.
Unified metrics effectively measure alignment and calibration.
Abstract
We present an interactive framework for evaluating whether large language models (LLMs) exhibit genuine "understanding" in a simple yet strategic environment. As a running example, we focus on Rock-Paper-Scissors (RPS), which, despite its apparent simplicity, requires sequential reasoning, adaptation, and strategy recognition. Our system positions the LLM as an Observer whose task is to identify which strategies are being played and to articulate the reasoning behind this judgment. The purpose is not to test knowledge of Rock-Paper-Scissors itself, but to probe whether the model can exhibit mind-like reasoning about sequential behavior. To support systematic evaluation, we provide a benchmark consisting of both static strategies and lightweight dynamic strategies specified by well-prompted rules. We quantify alignment between the Observer's predictions and the ground-truth distributions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Artificial Intelligence in Games · Multimodal Machine Learning Applications
