Explore the Reasoning Capability of LLMs in the Chess Testbed

Shu Wang; Lei Ji; Renxi Wang; Wenxiao Zhao; Haokun Liu; Yifan Hou,; Ying Nian Wu

arXiv:2411.06655·cs.CL·March 3, 2025

Explore the Reasoning Capability of LLMs in the Chess Testbed

Shu Wang, Lei Ji, Renxi Wang, Wenxiao Zhao, Haokun Liu, Yifan Hou,, Ying Nian Wu

PDF

Open Access 1 Datasets 1 Video

TL;DR

This paper enhances large language models' reasoning in chess by integrating expert-annotated strategy and tactics, demonstrating improved move selection and the benefit of language explanations.

Contribution

It introduces the MATE dataset with expert annotations and finetunes LLaMA-3-8B to improve chess reasoning, outperforming existing models.

Findings

01

Finetuned LLaMA-3-8B surpasses GPT, Claude, and Gemini in chess move selection.

02

Language explanations improve reasoning capabilities of LLMs.

03

Expert-annotated strategy and tactic data enhances model performance.

Abstract

Reasoning is a central capability of human intelligence. In recent years, with the advent of large-scale datasets, pretrained large language models have emerged with new capabilities, including reasoning. However, these models still struggle with long-term, complex reasoning tasks, such as playing chess. Based on the observation that expert chess players employ a dual approach combining long-term strategic play with short-term tactical play along with language explanation, we propose improving the reasoning capability of large language models in chess by integrating annotated strategy and tactic. Specifically, we collect a dataset named MATE, which consists of 1 million chess positions with candidate moves annotated by chess experts for strategy and tactics. We finetune the LLaMA-3-8B model and compare it against state-of-the-art commercial language models in the task of selecting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

OutFlankShu/MATE_DATASET
dataset· 122 dl
122 dl

Videos

Explore the Reasoning Capability of LLMs in the Chess Testbed· underline

Taxonomy

TopicsSports Analytics and Performance · Educational Games and Gamification · Software Engineering Research

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Cosine Annealing · Adam · Attention Dropout · Multi-Head Attention · Residual Connection · Softmax · Byte Pair Encoding