Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem
Yubo Wang, Ping Nie, Kai Zou, Lijun Wu, Wenhu Chen

TL;DR
This paper introduces Critique Fine-Tuning (CFT), a simple and efficient method that significantly enhances the reasoning abilities of large language models by fine-tuning on critique data from a single problem, outperforming costly reinforcement learning.
Contribution
The work demonstrates that one-shot critique fine-tuning on a single problem can effectively unlock the reasoning potential of large language models, reducing compute costs substantially.
Findings
CFT improves reasoning performance by 15-16% on benchmarks.
CFT requires only 5 GPU hours, much less than RL.
Robustness of CFT across different problems confirmed.
Abstract
We have witnessed that strong LLMs like Qwen-Math, MiMo, and Phi-4 possess immense reasoning potential inherited from the pre-training stage. With reinforcement learning (RL), these models can improve dramatically on reasoning tasks. Recent studies have shown that even RL on a single problem can unleash these models' reasoning capabilities. However, RL is not only expensive but also unstable. Even one-shot RL requires hundreds of GPU hours. This raises a critical question: Is there a more efficient way to unleash the reasoning potential of these powerful base LLMs? In this work, we demonstrate that Critique Fine-Tuning (CFT) on only one problem can effectively unleash the reasoning potential of LLMs. Our method constructs critique data by collecting diverse model-generated solutions to a single problem and using teacher LLMs to provide detailed critiques. We fine-tune Qwen and Llama…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Evolutionary Algorithms and Applications · Topic Modeling
MethodsBalanced Selection · LLaMA
