TL;DR
This paper investigates how semantic noise affects neural natural language generation systems, demonstrating that data cleaning significantly improves semantic accuracy and highlighting omission errors as the primary issue.
Contribution
It provides the first comprehensive analysis of semantic noise impact on NNLG models with various control mechanisms and shows data cleaning as an effective solution.
Findings
Semantic noise reduces correctness in NNLG models.
Cleaning data improves semantic correctness by up to 97%.
Omission errors are more common than hallucinations.
Abstract
Neural natural language generation (NNLG) systems are known for their pathological outputs, i.e. generating text which is unrelated to the input specification. In this paper, we show the impact of semantic noise on state-of-the-art NNLG models which implement different semantic control mechanisms. We find that cleaned data can improve semantic correctness by up to 97%, while maintaining fluency. We also find that the most common error is omitting information, rather than hallucination.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
