TL;DR
This paper introduces a word-level control method for hallucinations in data-to-text generation, using a Multi-Branch Decoder trained with word-level labels derived from co-occurrence and dependency analysis, improving factual accuracy.
Contribution
It presents a novel word-level approach with a Multi-Branch Decoder for controlling hallucinations, outperforming previous instance-level methods in data-to-text generation.
Findings
Reduces hallucinations while maintaining fluency.
Effective in noisy and imperfect data settings.
Validated on WikiBio and ToTTo benchmarks.
Abstract
Data-to-Text Generation (DTG) is a subfield of Natural Language Generation aiming at transcribing structured data in natural language descriptions. The field has been recently boosted by the use of neural-based generators which exhibit on one side great syntactic skills without the need of hand-crafted pipelines; on the other side, the quality of the generated text reflects the quality of the training data, which in realistic settings only offer imperfectly aligned structure-text pairs. Consequently, state-of-art neural models include misleading statements - usually called hallucinations - in their outputs. The control of this phenomenon is today a major challenge for DTG, and is the problem addressed in the paper. Previous work deal with this issue at the instance level: using an alignment score for each table-reference pair. In contrast, we propose a finer-grained approach, arguing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
