How Clued up are LLMs? Evaluating Multi-Step Deductive Reasoning in a Text-Based Game Environment

Rebecca Ansell; Autumn Toney-Wails

arXiv:2603.17169·cs.AI·March 19, 2026

How Clued up are LLMs? Evaluating Multi-Step Deductive Reasoning in a Text-Based Game Environment

Rebecca Ansell, Autumn Toney-Wails

PDF

Open Access

TL;DR

This study evaluates the deductive reasoning abilities of large language models in a text-based game environment, revealing their limited success and the ineffectiveness of fine-tuning for improving multi-step reasoning.

Contribution

Introduces a novel rule-based text game environment for assessing multi-step deductive reasoning in LLMs and analyzes the impact of fine-tuning on reasoning performance.

Findings

01

LLMs achieved only four wins in 18 games, showing limited deductive success.

02

Fine-tuning did not consistently enhance reasoning accuracy or gameplay performance.

03

In some cases, fine-tuning increased reasoning activity without improving correctness.

Abstract

Deducing whodunit proves challenging for LLM agents. In this paper, we implement a text-based multi-agent version of the classic board game Clue as a rule-based testbed for evaluating multi-step deductive reasoning, with six agents drawn from GPT-4o-mini and Gemini-2.5-Flash. We further investigate whether fine-tuning on structured logic puzzles transfers to improved in-game reasoning and gameplay. Across 18 simulated games, agents achieve only four correct wins, indicating difficulty in maintaining consistent deductive reasoning over the course of a full game. Additionally, we find that fine-tuning does not reliably improve performance and, in some cases, appears to increase reasoning volume without improving reasoning precision.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Games · Multi-Agent Systems and Negotiation · Explainable Artificial Intelligence (XAI)