What's in a Name? Answer Equivalence For Open-Domain Question Answering

Chenglei Si; Chen Zhao; Jordan Boyd-Graber

arXiv:2109.05289·cs.CL·September 14, 2021

What's in a Name? Answer Equivalence For Open-Domain Question Answering

Chenglei Si, Chen Zhao, Jordan Boyd-Graber

PDF

Open Access 1 Repo

TL;DR

This paper addresses the limitation of QA evaluation metrics by mining alias entities from knowledge bases to include semantically equivalent answers, improving evaluation accuracy and model training.

Contribution

It introduces a method to incorporate alias answers into QA evaluation and training, enhancing answer recognition without requiring new annotations.

Findings

01

Answer expansion increases exact match scores across datasets.

02

Incorporating alias answers improves model training effectiveness.

03

Human evaluation confirms the validity of additional answers.

Abstract

A flaw in QA evaluation is that annotations often only provide one gold answer. Thus, model predictions semantically equivalent to the answer but superficially different are considered incorrect. This work explores mining alias entities from knowledge bases and using them as additional gold answers (i.e., equivalent answers). We incorporate answers for two settings: evaluation with additional answers and model training with equivalent answers. We analyse three QA benchmarks: Natural Questions, TriviaQA, and SQuAD. Answer expansion increases the exact match score on all datasets for evaluation, while incorporating it helps model training over real-world datasets. We ensure the additional answers are valid through a human post hoc evaluation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

noviscl/answerequiv
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsHigh-Order Consensuses