Efficient Long CoT Reasoning in Small Language Models
Zhaoyang Wang, Jinqi Jiang, Tian Qiu, Hui Liu, Xianfeng Tang, Huaxiu Yao

TL;DR
This paper introduces a method to prune redundant steps in long chain-of-thought reasoning, enabling small language models to learn efficient reasoning with reduced redundancy while maintaining performance.
Contribution
It proposes a simple pruning technique and on-policy data curation to help small models learn effective long CoT reasoning without unnecessary steps.
Findings
Reduces redundant reasoning steps in small models' CoT
Maintains competitive performance with fewer reasoning steps
Effective across multiple mathematical reasoning benchmarks
Abstract
Recent large reasoning models such as DeepSeek-R1 exhibit strong complex problems solving abilities by generating long chain-of-thought (CoT) reasoning steps. It is challenging to directly train small language models (SLMs) to emerge long CoT. Thus, distillation becomes a practical method to enable SLMs for such reasoning ability. However, the long CoT often contains a lot of redundant contents (e.g., overthinking steps) which may make SLMs hard to learn considering their relatively poor capacity and generalization. To address this issue, we propose a simple-yet-effective method to prune unnecessary steps in long CoT, and then employ an on-policy method for the SLM itself to curate valid and useful long CoT training data. In this way, SLMs can effectively learn efficient long CoT reasoning and preserve competitive performance at the same time. Experimental results across a series of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Natural Language Processing Techniques · Topic Modeling
