Long-Short Chain-of-Thought Mixture Supervised Fine-Tuning Eliciting Efficient Reasoning in Large Language Models

Bin Yu; Hang Yuan; Haotian Li; Xueyin Xu; Yuliang Wei; Bailing Wang; Weizhen Qi; Kai Chen

arXiv:2505.03469·cs.CL·May 22, 2025

Long-Short Chain-of-Thought Mixture Supervised Fine-Tuning Eliciting Efficient Reasoning in Large Language Models

Bin Yu, Hang Yuan, Haotian Li, Xueyin Xu, Yuliang Wei, Bailing Wang, Weizhen Qi, Kai Chen

PDF

Open Access 1 Repo

TL;DR

This paper introduces LS-Mixture SFT, a fine-tuning method that combines long and short reasoning datasets to improve accuracy and reduce verbosity in large language models' reasoning processes.

Contribution

The paper proposes a novel fine-tuning approach that mitigates overthinking in reasoning models by mixing long and short reasoning datasets, enhancing efficiency and accuracy.

Findings

01

Achieved 2.3% average accuracy improvement across benchmarks.

02

Reduced model response length by approximately 47.61%.

03

Enabled reasoning capabilities without inheriting overthinking from teacher models.

Abstract

Recent advances in large language models have demonstrated that Supervised Fine-Tuning (SFT) with Chain-of-Thought (CoT) reasoning data distilled from large reasoning models (e.g., DeepSeek R1) can effectively transfer reasoning capabilities to non-reasoning models. However, models fine-tuned with this approach inherit the "overthinking" problem from teacher models, producing verbose and redundant reasoning chains during inference. To address this challenge, we propose Long-Short Chain-of-Thought Mixture Supervised Fine-Tuning (LS-Mixture SFT), which combines long CoT reasoning dataset with their short counterparts obtained through structure-preserved rewriting. Our experiments demonstrate that models trained using the LS-Mixture SFT method, compared to those trained with direct SFT, achieved an average accuracy improvement of 2.3% across various benchmarks while substantially reducing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zgca-ai4edu/ls-mixture
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsShrink and Fine-Tune