Solving math word problems with process- and outcome-based feedback

Jonathan Uesato; Nate Kushman; Ramana Kumar; Francis Song; Noah; Siegel; Lisa Wang; Antonia Creswell; Geoffrey Irving; Irina Higgins

arXiv:2211.14275·cs.LG·November 28, 2022·21 cites

Solving math word problems with process- and outcome-based feedback

Jonathan Uesato, Nate Kushman, Ramana Kumar, Francis Song, Noah, Siegel, Lisa Wang, Antonia Creswell, Geoffrey Irving, Irina Higgins

PDF

Open Access 10 Models 1 Datasets

TL;DR

This paper compares process- and outcome-based supervision for training language models on math word problems, finding that process-based feedback improves reasoning accuracy and overall performance.

Contribution

It provides the first comprehensive comparison between process- and outcome-based supervision on GSM8K, demonstrating the benefits of process-based feedback for reasoning accuracy.

Findings

01

Outcome supervision achieves similar final-answer accuracy with less labeling.

02

Process supervision or learned reward models are needed for correct reasoning steps.

03

Improved final-answer error from 16.8% to 12.7%, reasoning error from 14.0% to 3.4%.

Abstract

Recent work has shown that asking language models to generate reasoning steps improves performance on many reasoning tasks. When moving beyond prompting, this raises the question of how we should supervise such models: outcome-based approaches which supervise the final result, or process-based approaches which supervise the reasoning process itself? Differences between these approaches might naturally be expected not just in final-answer errors but also in reasoning errors, which can be difficult to detect and are problematic in many real-world domains such as education. We run the first comprehensive comparison between process- and outcome-based approaches trained on a natural language task, GSM8K. We find that pure outcome-based supervision produces similar final-answer error rates with less label supervision. However, for correct reasoning steps we find it necessary to use…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

LossFunctionLover/orm-pairwise-preference-pairs
dataset· 12 dl
12 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Intelligent Tutoring Systems and Adaptive Learning