Improving Value-based Process Verifier via Structural Prior Injection
Zetian Sun, Dongfang Li, Baotian Hu, Jun Yu, Min Zhang

TL;DR
This paper introduces a method to improve value-based process verifiers in LLM reasoning by injecting structural priors into value representations, reducing errors from Monte Carlo sampling and enhancing performance.
Contribution
It proposes a novel approach of incorporating structural priors into value estimation, optimizing distribution alignment to mitigate sampling errors in LLM reasoning tasks.
Findings
Structural prior injection improves verifier performance by 1-2 points.
Different priors significantly affect verifier performance despite same optimal solutions.
The method effectively reduces sampling errors in Monte Carlo value estimation.
Abstract
In the Large Language Model(LLM) reasoning scenario, people often estimate state value via Monte Carlo sampling. Though Monte Carlo estimation is an elegant method with less inductive bias, noise and errors are inevitably introduced due to the limited sampling. To handle the problem, we inject the structural prior into the value representation and transfer the scalar value into the expectation of a pre-defined categorical distribution, representing the noise and errors from a distribution perspective. Specifically, by treating the result of Monte Carlo sampling as a single sample from the prior ground-truth Binomial distribution, we quantify the sampling error as the mismatch between posterior estimated distribution and ground-truth distribution, which is thus optimized via distribution selection optimization. We test the performance of value-based process verifiers on Best-of-N task…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBusiness Process Modeling and Analysis · Formal Methods in Verification · Advanced Control Systems Optimization
