AURORA:Automated Training Framework of Universal Process Reward Models   via Ensemble Prompting and Reverse Verification

Xiaoyu Tan; Tianchu Yao; Chao Qu; Bin Li; Minghao Yang; Dakuan Lu,; Haozhe Wang; Xihe Qiu; Wei Chu; Yinghui Xu; Yuan Qi

arXiv:2502.11520·cs.CL·February 18, 2025

AURORA:Automated Training Framework of Universal Process Reward Models via Ensemble Prompting and Reverse Verification

Xiaoyu Tan, Tianchu Yao, Chao Qu, Bin Li, Minghao Yang, Dakuan Lu,, Haozhe Wang, Xihe Qiu, Wei Chu, Yinghui Xu, Yuan Qi

PDF

Open Access 1 Models

TL;DR

AURORA is an automated training framework for universal process reward models that uses ensemble prompting and reverse verification to improve evaluation accuracy across diverse policies and complex reasoning tasks.

Contribution

It introduces a novel two-phase automated training framework for process reward models, enhancing robustness and accuracy in complex reasoning scenarios.

Findings

01

Improves process evaluation accuracy across diverse policies.

02

Enhances reward model performance on long Chain-of-Thought outputs.

03

Extends benchmark evaluations with UniversalBench.

Abstract

The reasoning capabilities of advanced large language models (LLMs) like o1 have revolutionized artificial intelligence applications. Nevertheless, evaluating and optimizing complex reasoning processes remain significant challenges due to diverse policy distributions and the inherent limitations of human effort and accuracy. In this paper, we present AURORA, a novel automated framework for training universal process reward models (PRMs) using ensemble prompting and reverse verification. The framework employs a two-phase approach: First, it uses diverse prompting strategies and ensemble methods to perform automated annotation and evaluation of processes, ensuring robust assessments for reward learning. Second, it leverages practical reference answers for reverse verification, enhancing the model's ability to validate outputs and improving training accuracy. To assess the framework's…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
infly/Universal-PRM-7B
model· 8 dl· ♡ 8
8 dl♡ 8

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBusiness Process Modeling and Analysis