Challenges in Data-to-Document Generation
Sam Wiseman, Stuart M. Shieber, Alexander M. Rush

TL;DR
This paper explores the challenges of generating detailed descriptive documents from data records, highlighting current neural models' limitations and proposing new evaluation methods and baselines.
Contribution
It introduces a large-scale dataset for data-to-document generation, evaluates existing neural approaches, and suggests improvements through copy- and reconstruction-based methods.
Findings
Neural models produce fluent but often inaccurate documents.
Templated baselines outperform neural models on some metrics.
Copy- and reconstruction-based extensions improve performance.
Abstract
Recent neural models have shown significant progress on the problem of generating short descriptive texts conditioned on a small number of database records. In this work, we suggest a slightly more difficult data-to-text generation task, and investigate how effective current approaches are on this task. In particular, we introduce a new, large-scale corpus of data records paired with descriptive documents, propose a series of extractive evaluation methods for analyzing performance, and obtain baseline results using current neural generation methods. Experiments show that these models produce fluent text, but fail to convincingly approximate human-generated documents. Moreover, even templated baselines exceed the performance of these neural models on some metrics, though copy- and reconstruction-based extensions lead to noticeable improvements.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
