Towards Content Transfer through Grounded Text Generation
Shrimai Prabhumoye, Chris Quirk, Michel Galley

TL;DR
This paper introduces Content Transfer, a method for controlling neural text generation to produce contextually fitting sentences grounded in external content, demonstrated on Wikipedia data with a new benchmark dataset.
Contribution
It proposes the novel task of Content Transfer for long-form text generation and releases a large dataset to facilitate research in this area.
Findings
Significant improvements over baselines in Wikipedia experiments
Successful grounding of generated sentences in external sources
Introduction of a new benchmark dataset for content-controlled generation
Abstract
Recent work in neural generation has attracted significant interest in controlling the form of text, such as style, persona, and politeness. However, there has been less work on controlling neural text generation for content. This paper introduces the notion of Content Transfer for long-form text generation, where the task is to generate a next sentence in a document that both fits its context and is grounded in a content-rich external textual source such as a news story. Our experiments on Wikipedia data show significant improvements against competitive baselines. As another contribution of this paper, we release a benchmark dataset of 640k Wikipedia referenced sentences paired with the source articles to encourage exploration of this new task.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
