Accurate and Nuanced Open-QA Evaluation Through Textual Entailment

Peiran Yao; Denilson Barbosa

arXiv:2405.16702·cs.CL·May 28, 2024·1 cites

Accurate and Nuanced Open-QA Evaluation Through Textual Entailment

Peiran Yao, Denilson Barbosa

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces an entailment-based evaluation method for open-domain question answering that aligns more closely with human judgment by assessing semantic relations between answers, providing nuanced scoring without additional training.

Contribution

It proposes a learning-free, entailment-based evaluation approach that improves the accuracy and nuance of answer correctness assessment in open-QA systems.

Findings

01

Higher AUC than existing methods

02

Closer alignment with human judgment

03

Nuanced scoring of answer correctness

Abstract

Open-domain question answering (Open-QA) is a common task for evaluating large language models (LLMs). However, current Open-QA evaluations are criticized for the ambiguity in questions and the lack of semantic understanding in evaluators. Complex evaluators, powered by foundation models or LLMs and pertaining to semantic equivalence, still deviate from human judgments by a large margin. We propose to study the entailment relations of answers to identify more informative and more general system answers, offering a much closer evaluation to human judgment on both NaturalQuestions and TriviaQA while being learning-free. The entailment-based evaluation we propose allows the assignment of bonus or partial marks by quantifying the inference gap between answers, enabling a nuanced ranking of answer correctness that has higher AUC than current methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

U-Alberta/QA-partial-marks
pytorchOfficial

Videos

Accurate and Nuanced Open-QA Evaluation Through Textual Entailment· underline

Taxonomy

TopicsText and Document Classification Technologies · Advanced Text Analysis Techniques