Improve Mathematical Reasoning in Language Models by Automated Process   Supervision

Liangchen Luo; Yinxiao Liu; Rosanne Liu; Samrat Phatale; Meiqi Guo,; Harsh Lara; Yunxuan Li; Lei Shu; Yun Zhu; Lei Meng; Jiao Sun; Abhinav; Rastogi

arXiv:2406.06592·cs.CL·December 13, 2024·6 cites

Improve Mathematical Reasoning in Language Models by Automated Process Supervision

Liangchen Luo, Yinxiao Liu, Rosanne Liu, Samrat Phatale, Meiqi Guo,, Harsh Lara, Yunxuan Li, Lei Shu, Yun Zhu, Lei Meng, Jiao Sun, Abhinav, Rastogi

PDF

Open Access 2 Models

TL;DR

This paper introduces OmegaPRM, an automated Monte Carlo Tree Search algorithm that efficiently collects process supervision data to significantly improve the mathematical reasoning capabilities of large language models without human intervention.

Contribution

The authors develop OmegaPRM, a scalable, automated method for collecting high-quality process supervision data, enabling substantial performance gains in LLMs' math reasoning tasks.

Findings

01

Improved success rate of Gemini Pro from 51% to 69.4% on MATH500.

02

Boosted Gemma2 27B success rate from 42.3% to 58.2% on MATH500.

03

Achieved over 1.5 million process supervision annotations without human supervision.

Abstract

Complex multi-step reasoning tasks, such as solving mathematical problems or generating code, remain a significant hurdle for even the most advanced large language models (LLMs). Verifying LLM outputs with an Outcome Reward Model (ORM) is a standard inference-time technique aimed at enhancing the reasoning performance of LLMs. However, this still proves insufficient for reasoning tasks with a lengthy or multi-hop reasoning chain, where the intermediate outcomes are neither properly rewarded nor penalized. Process supervision addresses this limitation by assigning intermediate rewards during the reasoning process. To date, the methods used to collect process supervision data have relied on either human annotation or per-step Monte Carlo estimation, both prohibitively expensive to scale, thus hindering the broad application of this technique. In response to this challenge, we propose a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBusiness Process Modeling and Analysis

MethodsBalanced Selection