Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate
Yubo Wang, Xiang Yue, Wenhu Chen

TL;DR
This paper introduces Critique Fine-Tuning (CFT), a novel training method that improves reasoning in language models by teaching them to critique responses rather than imitate them, outperforming traditional supervised fine-tuning.
Contribution
CFT is a new fine-tuning approach that trains models to critique noisy responses, leading to significant improvements in reasoning tasks with less data and compute.
Findings
CFT outperforms SFT by 4-10% on reasoning benchmarks.
CFT achieves comparable or better results with less training data and compute.
CFT enhances instruction-following and general generation capabilities.
Abstract
Supervised Fine-Tuning (SFT) is commonly used to train language models to imitate annotated responses for given instructions. In this paper, we propose Critique Fine-Tuning (CFT), a method more effective than SFT for reasoning tasks. Instead of simply imitating correct responses, CFT trains models to critique noisy responses, inspired by human learning processes that emphasize critical thinking, deeper analysis, and nuanced understanding - traits often overlooked by standard SFT. To validate the effectiveness of CFT, we construct multiple critique datasets (e.g., WebInstruct, MetaMath, NuminaMath), where GPT-4o serves as the teacher to generate critiques in the form of ([query; noisy response], critique). Experiments on these datasets demonstrate that CFT consistently outperforms SFT by 4-10% across six mathematical reasoning benchmarks, and is effective across different base models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEducation and Critical Thinking Development
MethodsBalanced Selection · Shrink and Fine-Tune
