Semantic Noise Matters for Neural Natural Language Generation

Ond\v{r}ej Du\v{s}ek; David M. Howcroft; Verena Rieser

arXiv:1911.03905·cs.CL·November 12, 2019

Semantic Noise Matters for Neural Natural Language Generation

Ond\v{r}ej Du\v{s}ek, David M. Howcroft, Verena Rieser

PDF

1 Repo

TL;DR

This paper investigates how semantic noise affects neural natural language generation systems, demonstrating that data cleaning significantly improves semantic accuracy and highlighting omission errors as the primary issue.

Contribution

It provides the first comprehensive analysis of semantic noise impact on NNLG models with various control mechanisms and shows data cleaning as an effective solution.

Findings

01

Semantic noise reduces correctness in NNLG models.

02

Cleaning data improves semantic correctness by up to 97%.

03

Omission errors are more common than hallucinations.

Abstract

Neural natural language generation (NNLG) systems are known for their pathological outputs, i.e. generating text which is unrelated to the input specification. In this paper, we show the impact of semantic noise on state-of-the-art NNLG models which implement different semantic control mechanisms. We find that cleaned data can improve semantic correctness by up to 97%, while maintaining fluency. We also find that the most common error is omitting information, rather than hallucination.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tuetschek/e2e-cleaning
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.