O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning
Haotian Luo, Li Shen, Haiying He, Yibo Wang, Shiwei Liu, Wei Li,, Naiqiang Tan, Xiaochun Cao, and Dacheng Tao

TL;DR
O1-Pruner is a fine-tuning method that reduces inference time of long-thought reasoning LLMs by encouraging shorter reasoning processes without sacrificing accuracy, demonstrated on mathematical benchmarks.
Contribution
It introduces Length-Harmonizing Fine-Tuning (O1-Pruner), a novel RL-style approach to optimize reasoning length and efficiency in large language models.
Findings
Significantly reduces inference overhead in reasoning models.
Achieves higher accuracy on mathematical reasoning benchmarks.
Effectively balances reasoning length and accuracy.
Abstract
Recently, long-thought reasoning LLMs, such as OpenAI's O1, adopt extended reasoning processes similar to how humans ponder over complex problems. This reasoning paradigm significantly enhances the model's problem-solving abilities and has achieved promising results. However, long-thought reasoning process leads to a substantial increase in inference time. A pressing challenge is reducing the inference overhead of long-thought LLMs while ensuring accuracy. In this paper, we experimentally demonstrate that long-thought reasoning models struggle to effectively allocate token budgets based on problem difficulty and reasoning redundancies. To address this, we propose Length-Harmonizing Fine-Tuning (O1-Pruner), aiming at minimizing reasoning overhead while maintaining accuracy. This effective fine-tuning method first estimates the LLM's baseline performance through pre-sampling and then uses…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI-based Problem Solving and Planning · Constraint Satisfaction and Optimization
MethodsADaptive gradient method with the OPTimal convergence rate
