TL;DR
This paper introduces a rigorous human evaluation methodology for assessing the accuracy of data-to-text systems, demonstrated through basketball summaries, and used to validate automated metrics.
Contribution
It presents a novel gold-standard human evaluation approach for data-to-text accuracy and applies it to basketball summaries to benchmark automated metrics.
Findings
The methodology provides high-quality accuracy assessments.
Automated metrics are validated against human judgments.
The approach sets a standard for future evaluations.
Abstract
Most Natural Language Generation systems need to produce accurate texts. We propose a methodology for high-quality human evaluation of the accuracy of generated texts, which is intended to serve as a gold-standard for accuracy evaluations of data-to-text systems. We use our methodology to evaluate the accuracy of computer generated basketball summaries. We then show how our gold standard evaluation can be used to validate automated metrics
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
