'Tis but Thy Name: Semantic Question Answering Evaluation with 11M Names   for 1M Entities

Albert Huang

arXiv:2202.13581·cs.CL·March 1, 2022

'Tis but Thy Name: Semantic Question Answering Evaluation with 11M Names for 1M Entities

Albert Huang

PDF

Open Access 1 Datasets

TL;DR

This paper introduces WES, a large-scale semantic entity similarity dataset from Wikipedia, to improve QA evaluation metrics by better capturing semantic correctness over lexical matching.

Contribution

The paper presents WES, an 11 million example dataset for semantic entity similarity, tailored for QA evaluation, and demonstrates its effectiveness over traditional metrics.

Findings

01

WES dataset aligns well with human judgments.

02

A basic cross encoder outperforms classic metrics in predicting correctness.

03

WES enables more accurate semantic evaluation in QA systems.

Abstract

Classic lexical-matching-based QA metrics are slowly being phased out because they punish succinct or informative outputs just because those answers were not provided as ground truth. Recently proposed neural metrics can evaluate semantic similarity but were trained on small textual similarity datasets grafted from foreign domains. We introduce the Wiki Entity Similarity (WES) dataset, an 11M example, domain targeted, semantic entity similarity dataset that is generated from link texts in Wikipedia. WES is tailored to QA evaluation: the examples are entities and phrases and grouped into semantic clusters to simulate multiple ground-truth labels. Human annotators consistently agree with WES labels, and a basic cross encoder metric is better than four classic metrics at predicting human judgments of correctness.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Exr0n/wiki-entity-similarity
dataset· 165 dl
165 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification