Disentangling generalization and memorization in large language models using chess

Leonard S. Pleiss; Maximilian Schiffer; Robert K. von Weizsaecker

arXiv:2601.16823·cs.CL·May 20, 2026

Disentangling generalization and memorization in large language models using chess

Leonard S. Pleiss, Maximilian Schiffer, Robert K. von Weizsaecker

PDF

TL;DR

This paper uses chess as a testbed to analyze whether large language models rely on memorization or genuine reasoning, revealing limitations in their ability to generalize when relevant priors are scarce.

Contribution

It introduces a taxonomy based on chess positions to distinguish memorization from reasoning in LLMs without needing training data knowledge.

Findings

01

Performance drops as relevant priors decrease.

02

Models regress to random baseline on sparse prior tasks.

03

Reasoning-augmented inference offers limited gains without relevant priors.

Abstract

Large Language Models (LLMs) exhibit remarkable capabilities, yet it remains unclear to what extent these reflect sophisticated recall or genuine reasoning ability. We introduce chess as a controlled testbed aimed at disentangling these faculties. Leveraging the game's structure and scalable engine evaluations, we construct a taxonomy of positions varying in density of relevant priors - ranging from common states solvable by memorization to completely novel ones requiring generalization. Crucially, our approach achieves this distinction without requiring explicit knowledge of the models' training data. Applying this taxonomy, we combine a longitudinal analysis of the GPT lineage with a rigorous evaluation of contemporary models, including Claude Opus and Gemini. Our analysis reveals a steep gradient: performance consistently degrades as the density of relevant priors decreases. Notably,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Explainable Artificial Intelligence (XAI) · Topic Modeling