Shared Task on Evaluating Accuracy in Natural Language Generation
Ehud Reiter, Craig Thomson

TL;DR
This paper introduces a shared task focused on evaluating the accuracy of natural language generation systems, specifically for generating basketball game summaries from structured data.
Contribution
It presents a new benchmark for assessing accuracy in NLG systems through a collaborative shared task involving multiple methodologies.
Findings
Establishes a standardized evaluation framework
Encourages development of improved accuracy measurement methods
Facilitates comparison of NLG system performance
Abstract
We propose a shared task on methodologies and algorithms for evaluating the accuracy of generated texts. Participants will measure the accuracy of basketball game summaries produced by NLG systems from basketball box score data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
