REFINER: Reasoning Feedback on Intermediate Representations
Debjit Paul, Mete Ismayilzada, Maxime Peyrard, Beatriz Borges, Antoine, Bosselut, Robert West, and Boi Faltings

TL;DR
REFINER is a framework that enhances reasoning in language models by using a critic to provide automated feedback on intermediate steps, leading to improved accuracy without extensive human data.
Contribution
The paper introduces REFINER, a novel method for training language models to generate better intermediate reasoning steps through critic feedback, improving reasoning performance.
Findings
Significant improvements on three reasoning tasks.
Automated critic enhances reasoning without finetuning the main model.
Critic trained without human-in-the-loop data.
Abstract
Language models (LMs) have recently shown remarkable performance on reasoning tasks by explicitly generating intermediate inferences, e.g., chain-of-thought prompting. However, these intermediate inference steps may be inappropriate deductions from the initial context and lead to incorrect final predictions. Here we introduce REFINER, a framework for finetuning LMs to explicitly generate intermediate reasoning steps while interacting with a critic model that provides automated feedback on the reasoning. Specifically, the critic provides structured feedback that the reasoning LM uses to iteratively improve its intermediate arguments. Empirical evaluations of REFINER on three diverse reasoning tasks show significant improvements over baseline LMs of comparable scale. Furthermore, when using GPT-3.5 or ChatGPT as the reasoner, the trained critic significantly improves reasoning without…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications
MethodsAttention Is All You Need · Cosine Annealing · Weight Decay · Linear Layer · Byte Pair Encoding · 15 Ways to Contact How can i speak to someone at Delta Airlines · Multi-Head Attention · {Dispute@FaQ-s}How to file a dispute with Expedia? · Linear Warmup With Cosine Annealing · Adam
