Rationale-Aware Answer Verification by Pairwise Self-Evaluation
Akira Kawabata, Saku Sugawara

TL;DR
This paper introduces REPS, a method to improve answer verification by selecting valid rationales through pairwise self-evaluation, leading to more reliable verifiers in reasoning tasks.
Contribution
It proposes REPS, a novel rationale selection technique that enhances verifier training by focusing on rationale validity without extra human labeling.
Findings
Verifiers trained on REPS-selected rationales outperform baseline methods.
Only 19% of LLM solutions with correct answers have valid rationales in StrategyQA.
REPS improves verifier accuracy across multiple reasoning benchmarks.
Abstract
Answer verification identifies correct solutions among candidates generated by large language models (LLMs). Current approaches typically train verifier models by labeling solutions as correct or incorrect based solely on whether the final answer matches the gold answer. However, this approach neglects any flawed rationale in the solution yielding the correct answer, undermining the verifier's ability to distinguish between sound and flawed rationales. We empirically show that in StrategyQA, only 19% of LLM-generated solutions with correct answers have valid rationales, thus leading to an unreliable verifier. Furthermore, we demonstrate that training a verifier on valid rationales significantly improves its ability to distinguish valid and flawed rationale. To make a better verifier without extra human supervision, we introduce REPS (Rationale Enhancement through Pairwise Selection), a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Seismology and Earthquake Studies · Access Control and Trust
