On the Role of Reasoning Patterns in the Generalization Discrepancy of Long Chain-of-Thought Supervised Fine-Tuning

Zhaoyi Li; Xiangyu Xi; Zhengyu Chen; Wei Wang; Gangwei Jiang; Ranran Shen; Linqi Song; Ying Wei; Defu Lian

arXiv:2604.01702·cs.CL·April 7, 2026

On the Role of Reasoning Patterns in the Generalization Discrepancy of Long Chain-of-Thought Supervised Fine-Tuning

Zhaoyi Li, Xiangyu Xi, Zhengyu Chen, Wei Wang, Gangwei Jiang, Ranran Shen, Linqi Song, Ying Wei, Defu Lian

PDF

TL;DR

This study reveals that the reasoning pattern in training data significantly impacts model generalization, showing that divergent, exploratory trajectories hinder performance despite low training loss.

Contribution

It uncovers the influence of reasoning patterns in CoT trajectories on generalization and proposes filtering divergent trajectories to enhance reasoning performance.

Findings

01

Models trained on convergent, deductive trajectories generalize better.

02

Filtering out frequently branching trajectories improves reasoning accuracy.

03

Training on selected data subsets boosts performance on multiple benchmarks.

Abstract

Supervised Fine-Tuning (SFT) on long Chain-of-Thought (CoT) trajectories has become a pivotal phase in building large reasoning models. However, how CoT trajectories from different sources influence the generalization performance of models remains an open question. In this paper, we conduct a comparative study using two sources of verified CoT trajectories generated by two competing models, \texttt{DeepSeek-R1-0528} and \texttt{gpt-oss-120b}, with their problem sets controlled to be identical. Despite their comparable performance, we uncover a striking paradox: lower training loss does not translate to better generalization. SFT on \texttt{DeepSeek-R1-0528} data achieves remarkably lower training loss, yet exhibits significantly worse generalization performance on reasoning benchmarks compared to those trained on \texttt{gpt-oss-120b}. To understand this paradox, we perform a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.