TextGames: Learning to Self-Play Text-Based Puzzle Games via Language   Model Reasoning

Frederikus Hudi; Genta Indra Winata; Ruochen Zhang; Alham Fikri Aji

arXiv:2502.18431·cs.CL·February 26, 2025

TextGames: Learning to Self-Play Text-Based Puzzle Games via Language Model Reasoning

Frederikus Hudi, Genta Indra Winata, Ruochen Zhang, Alham Fikri Aji

PDF

Open Access 2 Repos 1 Video

TL;DR

This paper introduces TextGames, a benchmark for evaluating large language models' reasoning abilities in text-based puzzle games, revealing their strengths and limitations across various reasoning tasks.

Contribution

The paper presents a new benchmark, TextGames, designed to assess LLMs' reasoning skills in complex text-based puzzles, and analyzes their performance in single-turn and multi-turn scenarios.

Findings

01

LLMs perform well on easy and medium tasks but struggle with difficult ones.

02

Self-reflection improves multi-turn reasoning performance.

03

Reasoning-optimized models outperform instruction-following models.

Abstract

Reasoning is a fundamental capability of large language models (LLMs), enabling them to comprehend, analyze, and solve complex problems. In this paper, we introduce TextGames, an innovative benchmark specifically crafted to assess LLMs through demanding text-based games that require advanced skills in pattern recognition, spatial awareness, arithmetic, and logical reasoning. Our analysis probes LLMs' performance in both single-turn and multi-turn reasoning, and their abilities in leveraging feedback to correct subsequent answers through self-reflection. Our findings reveal that, although LLMs exhibit proficiency in addressing most easy and medium-level problems, they face significant challenges with more difficult tasks. In contrast, humans are capable of solving all tasks when given sufficient time. Moreover, we observe that LLMs show improved performance in multi-turn predictions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

TextGames: Learning to Self-Play Text-Based Puzzle Games via Language Model Reasoning· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling