Copy Is All You Need
Tian Lan, Deng Cai, Yan Wang, Heyan Huang, Xian-Ling Mao

TL;DR
This paper introduces a novel text generation method that copies segments from existing texts, improving quality and efficiency, and enabling easy domain adaptation without additional training.
Contribution
It proposes a copy-based text generation framework that leverages indexing of text segments, offering better quality, efficiency, and domain adaptability over traditional vocabulary-based models.
Findings
Outperforms standard benchmarks in quality according to evaluations.
Achieves comparable inference speed to autoregressive models.
Enables domain adaptation by switching text collections without retraining.
Abstract
The dominant text generation models compose the output by sequentially selecting words from a fixed vocabulary. In this paper, we formulate text generation as progressively copying text segments (e.g., words or phrases) from an existing text collection. We compute the contextualized representations of meaningful text segments and index them using efficient vector search toolkits. The task of text generation is then decomposed into a series of copy-and-paste operations: at each time step, we seek suitable text spans from the text collection rather than selecting from a standalone vocabulary. Experiments on the standard language modeling benchmark (WikiText-103) show that our approach achieves better generation quality according to both automatic and human evaluations. Besides, its inference efficiency is comparable to token-level autoregressive models thanks to the reduction of decoding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
