DRP: Distilled Reasoning Pruning with Skill-aware Step Decomposition for Efficient Large Reasoning Models
Yuxuan Jiang, Dawei Li, Francis Ferraro

TL;DR
DRP is a hybrid framework that enhances large reasoning models by pruning and distilling reasoning steps, significantly improving token efficiency while maintaining or boosting accuracy on mathematical reasoning tasks.
Contribution
It introduces a skill-aware step decomposition and content pruning method combined with distillation, leading to more efficient and accurate reasoning models.
Findings
DRP reduces token usage on GSM8K from 917 to 328 with improved accuracy.
Achieves 43% token reduction on AIME without performance loss.
Aligning reasoning structure with model capacity is crucial for effective knowledge transfer.
Abstract
While Large Reasoning Models (LRMs) have demonstrated success in complex reasoning tasks through long chain-of-thought (CoT) reasoning, their inference often involves excessively verbose reasoning traces, resulting in substantial inefficiency. To address this, we propose Distilled Reasoning Pruning (DRP), a hybrid framework that combines inference-time pruning with tuning-based distillation, two widely used strategies for efficient reasoning. DRP uses a teacher model to perform skill-aware step decomposition and content pruning, and then distills the pruned reasoning paths into a student model, enabling it to reason both efficiently and accurately. Across several challenging mathematical reasoning datasets, we find that models trained with DRP achieve substantial improvements in token efficiency without sacrificing accuracy. Specifically, DRP reduces average token usage on GSM8K from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
