Benchmarking Answer Verification Methods for Question Answering-Based   Summarization Evaluation Metrics

Daniel Deutsch; Dan Roth

arXiv:2204.10206·cs.CL·April 22, 2022

Benchmarking Answer Verification Methods for Question Answering-Based Summarization Evaluation Metrics

Daniel Deutsch, Dan Roth

PDF

Open Access

TL;DR

This paper benchmarks various answer verification methods for question answering-based summarization metrics, revealing that improved verification does not always enhance overall metric performance due to dataset properties.

Contribution

It systematically compares lexical, BERTScore, and LERC verification methods, highlighting their strengths and limitations in QA-based summarization evaluation.

Findings

01

LERC outperforms lexical methods in some settings

02

Improved verification does not always improve overall metric quality

03

Dataset properties significantly influence verification impact

Abstract

Question answering-based summarization evaluation metrics must automatically determine whether the QA model's prediction is correct or not, a task known as answer verification. In this work, we benchmark the lexical answer verification methods which have been used by current QA-based metrics as well as two more sophisticated text comparison methods, BERTScore and LERC. We find that LERC out-performs the other methods in some settings while remaining statistically indistinguishable from lexical overlap in others. However, our experiments reveal that improved verification performance does not necessarily translate to overall QA-based metric quality: In some scenarios, using a worse verification method -- or using none at all -- has comparable performance to using the best verification method, a result that we attribute to properties of the datasets.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques