AutoPSV: Automated Process-Supervised Verifier
Jianqiao Lu, Zhiyang Dou, Hongru Wang, Zeyu Cao, Jianbo Dai, Yingjia, Wan, Zhijiang Guo

TL;DR
AutoPSV is a novel method that automatically annotates reasoning steps in large language models, improving error detection and answer selection without extensive manual labeling or high computational costs.
Contribution
AutoPSV introduces an automated process annotation technique that enhances reasoning verification in LLMs by leveraging confidence score changes, reducing manual effort.
Findings
Effective error detection in reasoning steps.
Improved answer selection accuracy.
Significant performance gains on multiple datasets.
Abstract
In this work, we propose a novel method named \textbf{Auto}mated \textbf{P}rocess-\textbf{S}upervised \textbf{V}erifier (\textbf{\textsc{AutoPSV}}) to enhance the reasoning capabilities of large language models (LLMs) by automatically annotating the reasoning steps. \textsc{AutoPSV} begins by training a verification model on the correctness of final answers, enabling it to generate automatic process annotations. This verification model assigns a confidence score to each reasoning step, indicating the probability of arriving at the correct final answer from that point onward. We detect relative changes in the verification's confidence scores across reasoning steps to automatically annotate the reasoning process, enabling error detection even in scenarios where ground truth answers are unavailable. This alleviates the need for numerous manual annotations or the high computational costs…
Peer Reviews
Decision·NeurIPS 2024 poster
The paper is carefully written and provides good background for readers (like myself) not very familiar with the approaches considered. The idea of using the difference in the confidence of an OSV model at step t and at t+1 is interesting; it's perhaps a bit surprising that it actually works well, as it's not often easy to learn how good an early intermediate step is towards achieving the final goal. The findings in section 4 seem to lay out a good justification for the design choices behind Au
Not having familiarity with the area, I wasn't able to judge the novelty of the work and the substance (i.e., whether there is enough new material to warrant publication). It seems to me that f_theta() being a good indicator of the probability of reaching the correct final outcome is something known from reference [22]. The new part here is looking at the delta of f_theta() between time steps t and t+1, as done in Eq (3). In terms of naming, I found the use of the phrase "confidence variation"
Code & Models
Videos
Taxonomy
TopicsBusiness Process Modeling and Analysis · Semantic Web and Ontologies · Service-Oriented Architecture and Web Services
