Rationale-Aware Answer Verification by Pairwise Self-Evaluation

Akira Kawabata; Saku Sugawara

arXiv:2410.04838·cs.CL·October 28, 2024

Rationale-Aware Answer Verification by Pairwise Self-Evaluation

Akira Kawabata, Saku Sugawara

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces REPS, a method to improve answer verification by selecting valid rationales through pairwise self-evaluation, leading to more reliable verifiers in reasoning tasks.

Contribution

It proposes REPS, a novel rationale selection technique that enhances verifier training by focusing on rationale validity without extra human labeling.

Findings

01

Verifiers trained on REPS-selected rationales outperform baseline methods.

02

Only 19% of LLM solutions with correct answers have valid rationales in StrategyQA.

03

REPS improves verifier accuracy across multiple reasoning benchmarks.

Abstract

Answer verification identifies correct solutions among candidates generated by large language models (LLMs). Current approaches typically train verifier models by labeling solutions as correct or incorrect based solely on whether the final answer matches the gold answer. However, this approach neglects any flawed rationale in the solution yielding the correct answer, undermining the verifier's ability to distinguish between sound and flawed rationales. We empirically show that in StrategyQA, only 19% of LLM-generated solutions with correct answers have valid rationales, thus leading to an unreliable verifier. Furthermore, we demonstrate that training a verifier on valid rationales significantly improves its ability to distinguish valid and flawed rationale. To make a better verifier without extra human supervision, we introduce REPS (Rationale Enhancement through Pairwise Selection), a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

akirakawabata/reps
noneOfficial

Videos

Rationale-Aware Answer Verification by Pairwise Self-Evaluation· underline

Taxonomy

TopicsTopic Modeling · Seismology and Earthquake Studies · Access Control and Trust