Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem

Yubo Wang; Ping Nie; Kai Zou; Lijun Wu; Wenhu Chen

arXiv:2506.03295·cs.CL·June 6, 2025

Unleashing the Reasoning Potential of Pre-trained LLMs by Critique Fine-Tuning on One Problem

Yubo Wang, Ping Nie, Kai Zou, Lijun Wu, Wenhu Chen

PDF

Open Access 3 Models 2 Datasets

TL;DR

This paper introduces Critique Fine-Tuning (CFT), a simple and efficient method that significantly enhances the reasoning abilities of large language models by fine-tuning on critique data from a single problem, outperforming costly reinforcement learning.

Contribution

The work demonstrates that one-shot critique fine-tuning on a single problem can effectively unlock the reasoning potential of large language models, reducing compute costs substantially.

Findings

01

CFT improves reasoning performance by 15-16% on benchmarks.

02

CFT requires only 5 GPU hours, much less than RL.

03

Robustness of CFT across different problems confirmed.

Abstract

We have witnessed that strong LLMs like Qwen-Math, MiMo, and Phi-4 possess immense reasoning potential inherited from the pre-training stage. With reinforcement learning (RL), these models can improve dramatically on reasoning tasks. Recent studies have shown that even RL on a single problem can unleash these models' reasoning capabilities. However, RL is not only expensive but also unstable. Even one-shot RL requires hundreds of GPU hours. This raises a critical question: Is there a more efficient way to unleash the reasoning potential of these powerful base LLMs? In this work, we demonstrate that Critique Fine-Tuning (CFT) on only one problem can effectively unleash the reasoning potential of LLMs. Our method constructs critique data by collecting diverse model-generated solutions to a single problem and using teacher LLMs to provide detailed critiques. We fine-tune Qwen and Llama…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Evolutionary Algorithms and Applications · Topic Modeling

MethodsBalanced Selection · LLaMA