Uncertainty-Based Methods for Automated Process Reward Data Construction and Output Aggregation in Mathematical Reasoning

Jiuzhou Han; Wray Buntine; Ehsan Shareghi

arXiv:2508.01773·cs.AI·August 5, 2025

Uncertainty-Based Methods for Automated Process Reward Data Construction and Output Aggregation in Mathematical Reasoning

Jiuzhou Han, Wray Buntine, Ehsan Shareghi

PDF

Open Access 1 Video

TL;DR

This paper introduces an uncertainty-driven framework for automating the construction of process reward data and output aggregation in mathematical reasoning, improving the training and performance of process-level reward models.

Contribution

It proposes a novel uncertainty-based method for automated data construction and two aggregation techniques that enhance reasoning accuracy in large language models.

Findings

01

The framework improves data quality and training efficiency for PRMs.

02

The aggregation methods outperform traditional voting approaches.

03

Experimental results show enhanced reasoning performance across multiple datasets.

Abstract

Large language models have demonstrated remarkable capabilities in complex mathematical reasoning tasks, but they inevitably generate errors throughout multi-step solutions. Process-level Reward Models (PRMs) have shown great promise by providing supervision and evaluation at each intermediate step, thereby effectively improving the models' reasoning abilities. However, training effective PRMs requires high-quality process reward data, yet existing methods for constructing such data are often labour-intensive or inefficient. In this paper, we propose an uncertainty-driven framework for automated process reward data construction, encompassing both data generation and annotation processes for PRMs. Additionally, we identify the limitations of both majority vote and PRMs, and introduce two generic uncertainty-aware output aggregation methods: Hybrid Majority Reward Vote and Weighted Reward…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Uncertainty-Based Methods for Automated Process Reward Data Construction and Output Aggregation in Mathematical Reasoning· underline

Taxonomy

TopicsMachine Learning in Materials Science · Advanced Graph Neural Networks · Explainable Artificial Intelligence (XAI)