Generalization or Memorization? Brittleness Testing for Chess-Trained Language Models

Ethan Tang

arXiv:2605.17565·cs.AI·May 19, 2026

Generalization or Memorization? Brittleness Testing for Chess-Trained Language Models

Ethan Tang

PDF

1 Repo 1 Models 2 Datasets

TL;DR

This paper critically evaluates chess-trained language models, revealing their pattern-matching nature and demonstrating how verifier-in-the-loop frameworks significantly improve move accuracy and validity, offering a flexible alternative to domain-specific training.

Contribution

It introduces KinGPT, a character-level model trained on chess data, and shows how verifier-in-the-loop methods enhance performance, challenging claims of understanding in existing models.

Findings

01

KinGPT outperforms larger models on chess puzzles.

02

Verifier-in-the-loop improves move accuracy from 1.2% to 21.2%.

03

Open source code and models for reproducibility.

Abstract

Recent work has fine-tuned language models on chess data and reported high benchmark scores as evidence that the resulting models can understand the rules of chess, play full chess games at a professional level, or generate human-readable explanations grounded in expert knowledge. We train KinGPT, a 25M-parameter character-level language model trained only on (position, best-move) pairs, who exceeds 3B-parameter ChessGPT on a 600-puzzle mate-in-N suite and 4B-parameter C1-4B over a 20-theme puzzle benchmark. We examine several claims made in existing literature regarding chess-trained language models and assert that their impressive benchmark performance is largely explained by pattern-matching. We also demonstrate how LLM-Modulo, a verifier-in-the-loop framework, raises RedPajama 3B's best move accuracy from 1.2% to 21.2% and move generation validity from 19.3% to 95.3% on mate-in-N…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ethanjtang/GAMBIT
github

Models

🤗
ethanjtang/KinGPT
model· ♡ 1
♡ 1

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.