Explore the Reasoning Capability of LLMs in the Chess Testbed
Shu Wang, Lei Ji, Renxi Wang, Wenxiao Zhao, Haokun Liu, Yifan Hou,, Ying Nian Wu

TL;DR
This paper enhances large language models' reasoning in chess by integrating expert-annotated strategy and tactics, demonstrating improved move selection and the benefit of language explanations.
Contribution
It introduces the MATE dataset with expert annotations and finetunes LLaMA-3-8B to improve chess reasoning, outperforming existing models.
Findings
Finetuned LLaMA-3-8B surpasses GPT, Claude, and Gemini in chess move selection.
Language explanations improve reasoning capabilities of LLMs.
Expert-annotated strategy and tactic data enhances model performance.
Abstract
Reasoning is a central capability of human intelligence. In recent years, with the advent of large-scale datasets, pretrained large language models have emerged with new capabilities, including reasoning. However, these models still struggle with long-term, complex reasoning tasks, such as playing chess. Based on the observation that expert chess players employ a dual approach combining long-term strategic play with short-term tactical play along with language explanation, we propose improving the reasoning capability of large language models in chess by integrating annotated strategy and tactic. Specifically, we collect a dataset named MATE, which consists of 1 million chess positions with candidate moves annotated by chess experts for strategy and tactics. We finetune the LLaMA-3-8B model and compare it against state-of-the-art commercial language models in the task of selecting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsSports Analytics and Performance · Educational Games and Gamification · Software Engineering Research
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Cosine Annealing · Adam · Attention Dropout · Multi-Head Attention · Residual Connection · Softmax · Byte Pair Encoding
