Critique Fine-Tuning: Learning to Critique is More Effective than   Learning to Imitate

Yubo Wang; Xiang Yue; Wenhu Chen

arXiv:2501.17703·cs.CL·April 1, 2025

Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate

Yubo Wang, Xiang Yue, Wenhu Chen

PDF

Open Access 1 Repo 2 Models 1 Datasets

TL;DR

This paper introduces Critique Fine-Tuning (CFT), a novel training method that improves reasoning in language models by teaching them to critique responses rather than imitate them, outperforming traditional supervised fine-tuning.

Contribution

CFT is a new fine-tuning approach that trains models to critique noisy responses, leading to significant improvements in reasoning tasks with less data and compute.

Findings

01

CFT outperforms SFT by 4-10% on reasoning benchmarks.

02

CFT achieves comparable or better results with less training data and compute.

03

CFT enhances instruction-following and general generation capabilities.

Abstract

Supervised Fine-Tuning (SFT) is commonly used to train language models to imitate annotated responses for given instructions. In this paper, we propose Critique Fine-Tuning (CFT), a method more effective than SFT for reasoning tasks. Instead of simply imitating correct responses, CFT trains models to critique noisy responses, inspired by human learning processes that emphasize critical thinking, deeper analysis, and nuanced understanding - traits often overlooked by standard SFT. To validate the effectiveness of CFT, we construct multiple critique datasets (e.g., WebInstruct, MetaMath, NuminaMath), where GPT-4o serves as the teacher to generate critiques in the form of ([query; noisy response], critique). Experiments on these datasets demonstrate that CFT consistently outperforms SFT by 4-10% across six mathematical reasoning benchmarks, and is effective across different base models…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

TIGER-AI-Lab/CritiqueFineTuning
pytorch

Models

Datasets

TIGER-Lab/WebInstruct-CFT
dataset· 50 dl
50 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEducation and Critical Thinking Development

MethodsBalanced Selection · Shrink and Fine-Tune