TextGames: Learning to Self-Play Text-Based Puzzle Games via Language Model Reasoning
Frederikus Hudi, Genta Indra Winata, Ruochen Zhang, Alham Fikri Aji

TL;DR
This paper introduces TextGames, a benchmark for evaluating large language models' reasoning abilities in text-based puzzle games, revealing their strengths and limitations across various reasoning tasks.
Contribution
The paper presents a new benchmark, TextGames, designed to assess LLMs' reasoning skills in complex text-based puzzles, and analyzes their performance in single-turn and multi-turn scenarios.
Findings
LLMs perform well on easy and medium tasks but struggle with difficult ones.
Self-reflection improves multi-turn reasoning performance.
Reasoning-optimized models outperform instruction-following models.
Abstract
Reasoning is a fundamental capability of large language models (LLMs), enabling them to comprehend, analyze, and solve complex problems. In this paper, we introduce TextGames, an innovative benchmark specifically crafted to assess LLMs through demanding text-based games that require advanced skills in pattern recognition, spatial awareness, arithmetic, and logical reasoning. Our analysis probes LLMs' performance in both single-turn and multi-turn reasoning, and their abilities in leveraging feedback to correct subsequent answers through self-reflection. Our findings reveal that, although LLMs exhibit proficiency in addressing most easy and medium-level problems, they face significant challenges with more difficult tasks. In contrast, humans are capable of solving all tasks when given sufficient time. Moreover, we observe that LLMs show improved performance in multi-turn predictions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
