Improving Value-based Process Verifier via Low-Cost Variance Reduction
Zetian Sun, Dongfang Li, Baotian Hu, Min Zhang

TL;DR
This paper introduces ComMCS, a variance reduction technique for value-based process verifiers in LLM reasoning tasks, improving estimation accuracy without extra inference costs.
Contribution
The paper proposes ComMCS, a novel unbiased variance reduction method for Monte Carlo estimators, enhancing reasoning accuracy in LLM-based process verification.
Findings
ComMCS reduces variance predictably without additional inference cost.
It outperforms baseline methods by 2.2 to 2.8 points on MATH-500.
Empirical results validate the theoretical variance reduction benefits.
Abstract
Large language models (LLMs) have achieved remarkable success in a wide range of tasks. However, their reasoning capabilities, particularly in complex domains like mathematics, remain a significant challenge. Value-based process verifiers, which estimate the probability of a partial reasoning chain leading to a correct solution, are a promising approach for improving reasoning. Nevertheless, their effectiveness is often hindered by estimation error in their training annotations, a consequence of the limited number of Monte Carlo (MC) samples feasible due to the high cost of LLM inference. In this paper, we identify that the estimation error primarily arises from high variance rather than bias, and the MC estimator is a Minimum Variance Unbiased Estimator (MVUE). To address the problem, we propose the \textsc{Com}pound \textsc{M}onte \textsc{C}arlo \textsc{S}ampling (ComMCS) method,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
