Uncertainty-Based Methods for Automated Process Reward Data Construction and Output Aggregation in Mathematical Reasoning
Jiuzhou Han, Wray Buntine, Ehsan Shareghi

TL;DR
This paper introduces an uncertainty-driven framework for automating the construction of process reward data and output aggregation in mathematical reasoning, improving the training and performance of process-level reward models.
Contribution
It proposes a novel uncertainty-based method for automated data construction and two aggregation techniques that enhance reasoning accuracy in large language models.
Findings
The framework improves data quality and training efficiency for PRMs.
The aggregation methods outperform traditional voting approaches.
Experimental results show enhanced reasoning performance across multiple datasets.
Abstract
Large language models have demonstrated remarkable capabilities in complex mathematical reasoning tasks, but they inevitably generate errors throughout multi-step solutions. Process-level Reward Models (PRMs) have shown great promise by providing supervision and evaluation at each intermediate step, thereby effectively improving the models' reasoning abilities. However, training effective PRMs requires high-quality process reward data, yet existing methods for constructing such data are often labour-intensive or inefficient. In this paper, we propose an uncertainty-driven framework for automated process reward data construction, encompassing both data generation and annotation processes for PRMs. Additionally, we identify the limitations of both majority vote and PRMs, and introduce two generic uncertainty-aware output aggregation methods: Hybrid Majority Reward Vote and Weighted Reward…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMachine Learning in Materials Science · Advanced Graph Neural Networks · Explainable Artificial Intelligence (XAI)
