A Benchmark for Text Expansion: Datasets, Metrics, and Baselines
Yi Chen, Haiyun Jiang, Wei Bi, Rui Wang, Longyue Wang, Shuming Shi,, Ruifeng Xu

TL;DR
This paper introduces Text Expansion as a new task, providing datasets, metrics, and baselines to improve the insertion of modifiers into plain text for more vivid and concrete writing.
Contribution
It defines the TE task, creates a large-scale dataset with automatic and human annotations, and proposes new evaluation metrics including Info-Gain for informativeness.
Findings
TE is feasible with current models.
Proposed models outperform Text2Text baselines.
Info-Gain effectively measures expansion informativeness.
Abstract
This work presents a new task of Text Expansion (TE), which aims to insert fine-grained modifiers into proper locations of the plain text to concretize or vivify human writings. Different from existing insertion-based writing assistance tasks, TE requires the model to be more flexible in both locating and generation, and also more cautious in keeping basic semantics. We leverage four complementary approaches to construct a dataset with 12 million automatically generated instances and 2K human-annotated references for both English and Chinese. To facilitate automatic evaluation, we design various metrics from multiple perspectives. In particular, we propose Info-Gain to effectively measure the informativeness of expansions, which is an important quality dimension in TE. On top of a pre-trained text-infilling model, we build both pipelined and joint Locate&Infill models, which demonstrate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
