Critique-Guided Distillation for Robust Reasoning via Refinement

Berkcan Kapusuzoglu; Supriyo Chakraborty; Zain Sarwar; Chia-Hsuan Lee; Sambit Sahu

arXiv:2505.11628·cs.CL·May 20, 2026

Critique-Guided Distillation for Robust Reasoning via Refinement

Berkcan Kapusuzoglu, Supriyo Chakraborty, Zain Sarwar, Chia-Hsuan Lee, Sambit Sahu

PDF

3 Reviews

TL;DR

The paper introduces Critique-Guided Distillation (CGD), a training method that improves reasoning in models by internalizing critique-based feedback during training, leading to better performance on reasoning tasks without affecting inference capabilities.

Contribution

CGD decouples critique consumption from generation, enabling models to internalize error-aware reasoning through critique-guided training, outperforming prior critique-based methods.

Findings

01

CGD achieves 7% average improvement on reasoning benchmarks.

02

CGD improves performance on challenging competition problems like AIME24 and AIME25.

03

CGD maintains instruction-following abilities better than critique fine-tuning (CFT).

Abstract

Supervised fine-tuning with expert demonstrations often produces models that imitate outputs without internalizing the reasoning processes needed for robust generalization. While critique-based approaches show promise, training models to generate critiques directly, such as Critique Fine-Tuning (CFT), can lead to output-format drift and degradation of general capabilities. We propose Critique-Guided Distillation (CGD), a training framework that decouples critique consumption from critique generation. During fine-tuning, the student is trained to refine flawed responses conditioned on teacher critiques. CGD treats critiques as a \textit{training-time-only} supervision signal, encouraging internalization of error-aware reasoning: critiques guide learning but are absent at inference. Controlled ablations confirm that these reasoning gains are directly driven by the specificity and…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 6Confidence 4

Strengths

- method is simple and easy to understand - I like that it included both Llama and Qwen - I also like that it explored both in-family (Llama student + Llama teacher) and cross-family (Qwen student + S1 teacher) - nice that the model is able to retain general instruction following, since this is something that is often lost when doing imitation SFT - generally quite comprehensive evaluation sets - strong results - ablation experiments are nice. I especially liked section 4.2.1 (comparison

Weaknesses

- because this field is so popular and this paper's contribution is relatively simple, I wouldn't be surprised if there's a few other concurrent work submitted to this conference that explores a very similar idea of incorporating critiques in SFT data. - (I realize this is another way to say "lacks novelty"..., though I think it's slightly more nuanced than that, since this subtopic is one that's currently very popular) - Even within this simple method, I think there are a few other areas t

Reviewer 02Rating 4Confidence 3

Strengths

### **1. Integrating Critique and Correction** The paper introduces a well-motivated training framework that unifies critique understanding and refinement learning within a single fine-tuning stage. ### **2. Strong Empirical Performance Across Multiple Benchmarks** The authors provide comprehensive experimental validation across both mathematical and general reasoning benchmarks. CGD achieves large and consistent gains (e.g., +15% on AMC23, +12.2% on MATH-500) over SFT and CFT baselines.

Weaknesses

1. **Novelty Concern and Overlap with Prior Work** Although CGD is presented as a novel fine-tuning paradigm, its conceptual foundation bears strong resemblance to prior works such as **ORCA (Mukherjee et al., 2023)** and **Chain-of-Thought Distillation (Li et al., 2024)**. These earlier methods also transfer reasoning traces or critique signals from a stronger teacher to a smaller student. CGD’s main distinction — conditioning the student on both its own response and the teacher’s critiqu

Reviewer 03Rating 2Confidence 4

Strengths

1. Motivated by limitations of vanilla SFT and CFT 2. Improves several math-reasoning benchmarks compared to SFT and CFT. 3. Preserves general instruction-following where CFT degrades it.

Weaknesses

1. Insufficient motivation. While CGD exhibits empirical gains, the paper does not convincingly explain why conditioning on critiques during training, but omitting them at inference, should improve from-scratch reasoning. During training, the student learns to rely on critique signals that are not available at inference. The paper does not explain how critique-conditioned refinements translate into unconditional answer generation, nor does it analyze whether this reliance introduces brittleness.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsShrink and Fine-Tune