How Reasoning Evolves from Post-Training Data: An Empirical Study Using Chess

Lucas Dionisopoulos; Nicklas Majamaki; Prithviraj Ammanabrolu

arXiv:2604.05134·cs.LG·May 5, 2026

How Reasoning Evolves from Post-Training Data: An Empirical Study Using Chess

Lucas Dionisopoulos, Nicklas Majamaki, Prithviraj Ammanabrolu

PDF

1 Repo 1 Datasets

TL;DR

This study investigates how reasoning in language models evolves from supervised fine-tuning to reinforcement learning in the context of chess, revealing insights into faithful reasoning and performance metrics.

Contribution

It demonstrates that fine-tuning on move prediction enhances downstream performance and faithful reasoning, with comprehensive analysis of metrics predicting post-RL performance.

Findings

01

Fine-tuning on best move prediction leads to strong downstream performance.

02

Training on multi-move trajectories yields faithful reasoning and stable RL.

03

Metrics from SFT checkpoints can predict post-RL model performance.

Abstract

We study how reasoning evolves in a language model -- from supervised fine-tuning (SFT) to reinforcement learning (RL) -- by analyzing how a set of theoretically-inspired datasets influences language model performance in chess. We find that fine-tuning a model to directly predict the best move leads to effective RL and the strongest downstream performance -- however, the RL stage elicits \textit{unfaithful} reasoning (reasoning inconsistent with the chosen move). Alternatively, training on multi-move trajectories yields comparable downstream performance with faithful reasoning and more stable RL. We analyze multiple qualitative and quantitative measures and highlight how these evolve from SFT through RL; we find several SFT-checkpoint metrics -- spanning evaluation performance, hallucination rates, and reasoning quality -- to be predictive of post-RL model performance. Finally, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lucasdino/lang-chess
github

Datasets

lucasdino/chess-reasoning-data
dataset· 100 dl
100 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.