Challenges in Data-to-Document Generation

Sam Wiseman; Stuart M. Shieber; Alexander M. Rush

arXiv:1707.08052·cs.CL·July 26, 2017·53 cites

Challenges in Data-to-Document Generation

Sam Wiseman, Stuart M. Shieber, Alexander M. Rush

PDF

Open Access 4 Repos 1 Datasets

TL;DR

This paper explores the challenges of generating detailed descriptive documents from data records, highlighting current neural models' limitations and proposing new evaluation methods and baselines.

Contribution

It introduces a large-scale dataset for data-to-document generation, evaluates existing neural approaches, and suggests improvements through copy- and reconstruction-based methods.

Findings

01

Neural models produce fluent but often inaccurate documents.

02

Templated baselines outperform neural models on some metrics.

03

Copy- and reconstruction-based extensions improve performance.

Abstract

Recent neural models have shown significant progress on the problem of generating short descriptive texts conditioned on a small number of database records. In this work, we suggest a slightly more difficult data-to-text generation task, and investigate how effective current approaches are on this task. In particular, we introduce a new, large-scale corpus of data records paired with descriptive documents, propose a series of extractive evaluation methods for analyzing performance, and obtain baseline results using current neural generation methods. Experiments show that these models produce fluent text, but fail to convincingly approximate human-generated documents. Moreover, even templated baselines exceed the performance of these neural models on some metrics, though copy- and reconstruction-based extensions lead to noticeable improvements.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Datasets

ErenDemirel/nba_game_summeries_latex
dataset· 4 dl
4 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications