SWiPE: A Dataset for Document-Level Simplification of Wikipedia Pages
Philippe Laban, Jesse Vig, Wojciech Kryscinski, Shafiq Joty, Caiming, Xiong, Chien-Sheng Wu

TL;DR
The paper introduces SWiPE, a comprehensive dataset capturing document-level edits in Wikipedia simplification, along with models to automatically label these edits, advancing understanding of the simplification process.
Contribution
It provides a new dataset with detailed annotations of document-level edits in Wikipedia, and develops models to automatically identify and categorize these edits.
Findings
SWiPE dataset includes 5,000 annotated document pairs with over 40,000 labeled edits.
Automatic labeling models achieved up to 70.6 F-1 score, demonstrating task tractability.
Models trained on SWiPE produce more complex and fewer unwanted edits.
Abstract
Text simplification research has mostly focused on sentence-level simplification, even though many desirable edits - such as adding relevant background information or reordering content - may require document-level context. Prior work has also predominantly framed simplification as a single-step, input-to-output task, only implicitly modeling the fine-grained, span-level edits that elucidate the simplification process. To address both gaps, we introduce the SWiPE dataset, which reconstructs the document-level editing process from English Wikipedia (EW) articles to paired Simple Wikipedia (SEW) articles. In contrast to prior work, SWiPE leverages the entire revision history when pairing pages in order to better identify simplification edits. We work with Wikipedia editors to annotate 5,000 EW-SEW document pairs, labeling more than 40,000 edits with proposed 19 categories. To scale our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText Readability and Simplification · Natural Language Processing Techniques
