A Gold Standard Methodology for Evaluating Accuracy in Data-To-Text   Systems

Craig Thomson; Ehud Reiter

arXiv:2011.03992·cs.CL·November 10, 2020

A Gold Standard Methodology for Evaluating Accuracy in Data-To-Text Systems

Craig Thomson, Ehud Reiter

PDF

1 Repo

TL;DR

This paper introduces a rigorous human evaluation methodology for assessing the accuracy of data-to-text systems, demonstrated through basketball summaries, and used to validate automated metrics.

Contribution

It presents a novel gold-standard human evaluation approach for data-to-text accuracy and applies it to basketball summaries to benchmark automated metrics.

Findings

01

The methodology provides high-quality accuracy assessments.

02

Automated metrics are validated against human judgments.

03

The approach sets a standard for future evaluations.

Abstract

Most Natural Language Generation systems need to produce accurate texts. We propose a methodology for high-quality human evaluation of the accuracy of generated texts, which is intended to serve as a gold-standard for accuracy evaluations of data-to-text systems. We use our methodology to evaluate the accuracy of computer generated basketball summaries. We then show how our gold standard evaluation can be used to validate automated metrics

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nlgcat/evaluating_accuracy
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.