Improve Mathematical Reasoning in Language Models by Automated Process Supervision
Liangchen Luo, Yinxiao Liu, Rosanne Liu, Samrat Phatale, Meiqi Guo,, Harsh Lara, Yunxuan Li, Lei Shu, Yun Zhu, Lei Meng, Jiao Sun, Abhinav, Rastogi

TL;DR
This paper introduces OmegaPRM, an automated Monte Carlo Tree Search algorithm that efficiently collects process supervision data to significantly improve the mathematical reasoning capabilities of large language models without human intervention.
Contribution
The authors develop OmegaPRM, a scalable, automated method for collecting high-quality process supervision data, enabling substantial performance gains in LLMs' math reasoning tasks.
Findings
Improved success rate of Gemini Pro from 51% to 69.4% on MATH500.
Boosted Gemma2 27B success rate from 42.3% to 58.2% on MATH500.
Achieved over 1.5 million process supervision annotations without human supervision.
Abstract
Complex multi-step reasoning tasks, such as solving mathematical problems or generating code, remain a significant hurdle for even the most advanced large language models (LLMs). Verifying LLM outputs with an Outcome Reward Model (ORM) is a standard inference-time technique aimed at enhancing the reasoning performance of LLMs. However, this still proves insufficient for reasoning tasks with a lengthy or multi-hop reasoning chain, where the intermediate outcomes are neither properly rewarded nor penalized. Process supervision addresses this limitation by assigning intermediate rewards during the reasoning process. To date, the methods used to collect process supervision data have relied on either human annotation or per-step Monte Carlo estimation, both prohibitively expensive to scale, thus hindering the broad application of this technique. In response to this challenge, we propose a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBusiness Process Modeling and Analysis
MethodsBalanced Selection
