TL;DR
This paper introduces VerifySteer, a method to control verifier strictness in step-wise verification by latent state intervention, improving accuracy and efficiency without fine-tuning.
Contribution
It uncovers a hidden-state signal related to verifier strictness and proposes a novel latent steering technique that enhances verification performance and efficiency.
Findings
VerifySteer outperforms prompt optimization and activation steering baselines.
It is competitive with self-consistency methods while using less inference compute.
VerifySteer provides additional gains when combined with fine-tuned verifiers.
Abstract
Generative verifiers have emerged as a promising paradigm for step-wise verification, but their verification behavior is often poorly calibrated: they may be under-critical and miss erroneous steps, or over-critical and reject correct reasoning. We refer to this tendency to be overly lenient or overly critical as verifier strictness. In this work, we study whether verifier strictness can be controlled through hidden-state intervention. We uncover a verification-specific hidden-state signal: in step-wise verification, a verifier's tendency to accept or reject a solution step is encoded near the boundary of the corresponding verification paragraph. Exploiting this signal, we show that hidden-state steering can directly modulate verifier strictness without fine-tuning. However, uniform steering induces a trade-off between error detection and correctness certification. To address this, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
