Reasoning Capabilities of Large Language Models. Lessons Learned from General Game Playing

Maciej \'Swiechowski; Adam \.Zychowski; Jacek Ma\'ndziuk

arXiv:2602.19160·cs.AI·February 24, 2026

Reasoning Capabilities of Large Language Models. Lessons Learned from General Game Playing

Maciej \'Swiechowski, Adam \.Zychowski, Jacek Ma\'ndziuk

PDF

Open Access

TL;DR

This paper evaluates the reasoning abilities of large language models within rule-based environments using General Game Playing tasks, revealing their strengths, limitations, and common errors in formal reasoning.

Contribution

It introduces a comprehensive analysis of LLM reasoning in formal, rule-based settings, highlighting performance patterns, structural game features, and reasoning errors.

Findings

01

Models perform well across most tasks

02

Performance decreases with longer game horizons

03

Common errors include hallucinated rules and syntactic mistakes

Abstract

This paper examines the reasoning capabilities of Large Language Models (LLMs) from a novel perspective, focusing on their ability to operate within formally specified, rule-governed environments. We evaluate four LLMs (Gemini 2.5 Pro and Flash variants, Llama 3.3 70B and GPT-OSS 120B) on a suite of forward-simulation tasks-including next / multistep state formulation, and legal action generation-across a diverse set of reasoning problems illustrated through General Game Playing (GGP) game instances. Beyond reporting instance-level performance, we characterize games based on 40 structural features and analyze correlations between these features and LLM performance. Furthermore, we investigate the effects of various game obfuscations to assess the role of linguistic semantics in game definitions and the impact of potential prior exposure of LLMs to specific games during training. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Natural Language Processing Techniques