AutoPSV: Automated Process-Supervised Verifier

Jianqiao Lu; Zhiyang Dou; Hongru Wang; Zeyu Cao; Jianbo Dai; Yingjia; Wan; Zhijiang Guo

arXiv:2405.16802·cs.CL·October 25, 2024

AutoPSV: Automated Process-Supervised Verifier

Jianqiao Lu, Zhiyang Dou, Hongru Wang, Zeyu Cao, Jianbo Dai, Yingjia, Wan, Zhijiang Guo

PDF

Open Access 2 Repos 1 Video 1 Reviews

TL;DR

AutoPSV is a novel method that automatically annotates reasoning steps in large language models, improving error detection and answer selection without extensive manual labeling or high computational costs.

Contribution

AutoPSV introduces an automated process annotation technique that enhances reasoning verification in LLMs by leveraging confidence score changes, reducing manual effort.

Findings

01

Effective error detection in reasoning steps.

02

Improved answer selection accuracy.

03

Significant performance gains on multiple datasets.

Abstract

In this work, we propose a novel method named \textbf{Auto}mated \textbf{P}rocess-\textbf{S}upervised \textbf{V}erifier (\textbf{\textsc{AutoPSV}}) to enhance the reasoning capabilities of large language models (LLMs) by automatically annotating the reasoning steps. \textsc{AutoPSV} begins by training a verification model on the correctness of final answers, enabling it to generate automatic process annotations. This verification model assigns a confidence score to each reasoning step, indicating the probability of arriving at the correct final answer from that point onward. We detect relative changes in the verification's confidence scores across reasoning steps to automatically annotate the reasoning process, enabling error detection even in scenarios where ground truth answers are unavailable. This alleviates the need for numerous manual annotations or the high computational costs…

Peer Reviews

Decision·NeurIPS 2024 poster

Reviewer 01Rating 7Confidence 3

Strengths

The paper is carefully written and provides good background for readers (like myself) not very familiar with the approaches considered. The idea of using the difference in the confidence of an OSV model at step t and at t+1 is interesting; it's perhaps a bit surprising that it actually works well, as it's not often easy to learn how good an early intermediate step is towards achieving the final goal. The findings in section 4 seem to lay out a good justification for the design choices behind Au

Weaknesses

Not having familiarity with the area, I wasn't able to judge the novelty of the work and the substance (i.e., whether there is enough new material to warrant publication). It seems to me that f_theta() being a good indicator of the probability of reaching the correct final outcome is something known from reference [22]. The new part here is looking at the delta of f_theta() between time steps t and t+1, as done in Eq (3). In terms of naming, I found the use of the phrase "confidence variation"

Code & Models

Repositories

Videos

AutoPSV: Automated Process-Supervised Verifier· slideslive

Taxonomy

TopicsBusiness Process Modeling and Analysis · Semantic Web and Ontologies · Service-Oriented Architecture and Web Services