How to Fine-Tune a Reasoning Model? A Teacher-Student Cooperation Framework to Synthesize Student-Consistent SFT Data

Zixian Huang; Kaichen Yang; Xu Huang; Feiyang Hao; Qiming Ge; Bowen Li; He Du; Kai Chen; Qipeng Guo

arXiv:2604.14164·cs.CL·April 22, 2026

How to Fine-Tune a Reasoning Model? A Teacher-Student Cooperation Framework to Synthesize Student-Consistent SFT Data

Zixian Huang, Kaichen Yang, Xu Huang, Feiyang Hao, Qiming Ge, Bowen Li, He Du, Kai Chen, Qipeng Guo

PDF

1 Repo 3 Datasets

TL;DR

This paper introduces TESSY, a framework that improves fine-tuning of reasoning models by synthesizing data that balances teacher expertise with student style, leading to better reasoning performance.

Contribution

The paper proposes TESSY, a novel teacher-student cooperation method for data synthesis that enhances reasoning model fine-tuning by maintaining stylistic consistency.

Findings

01

TESSY outperforms traditional teacher-generated data in reasoning tasks.

02

Fine-tuning with TESSY data improves performance on code generation benchmarks.

03

Teacher-student interleaving reduces stylistic divergence and boosts reasoning capabilities.

Abstract

A widely adopted strategy for model enhancement is to use synthetic data generated by a stronger model for supervised fine-tuning (SFT). However, for emerging reasoning models like Qwen3-8B, this approach often fails to improve reasoning capabilities and can even lead to a substantial drop in performance. In this work, we identify substantial stylistic divergence between teacher generated data and the distribution of student as a major factor impacting SFT. To bridge this gap, we propose a Teacher-Student Cooperation Data Synthesis framework (TESSY), which interleaves teacher and student models to alternately generate style and non-style tokens. Consequently, TESSY produces synthetic sequences that inherit the advanced reasoning capabilities of the teacher while maintaining stylistic consistency with the distribution of the student. In experiments on code generation using GPT-OSS-120B…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

coopreason/TESSY
github

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.