WikiIns: A High-Quality Dataset for Controlled Text Editing by Natural Language Instruction
Xiang Chen, Zheng Li, Xiaojun Wan

TL;DR
WikiIns is a high-quality dataset designed for controlled text editing using natural language instructions, addressing limitations of previous datasets by providing more informative instructions and supporting research in text revision tasks.
Contribution
The paper introduces WikiIns, a new dataset with high-quality, informative instructions for controlled text editing, along with methods to generate large-scale training data and comprehensive analysis.
Findings
The dataset improves instruction informativeness for text editing.
Automatic methods effectively generate large-scale training data.
Analysis reveals insights into edit intentions and dataset quality.
Abstract
Text editing, i.e., the process of modifying or manipulating text, is a crucial step in human writing process. In this paper, we study the problem of controlled text editing by natural language instruction. According to a given instruction that conveys the edit intention and necessary information, an original draft text is required to be revised into a target text. Existing automatically constructed datasets for this task are limited because they do not have informative natural language instruction. The informativeness requires the information contained in the instruction to be enough to produce the revised text. To address this limitation, we build and release WikiIns, a high-quality controlled text editing dataset with improved informativeness. We first preprocess the Wikipedia edit history database to extract the raw data (WikiIns-Raw). Then we crowdsource high-quality validation and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Wikis in Education and Collaboration
