For the sake of simplicity: Unsupervised extraction of lexical simplifications from Wikipedia
Mark Yatskar, Bo Pang, Cristian Danescu-Niculescu-Mizil, Lillian, Lee

TL;DR
This paper explores unsupervised methods for extracting lexical simplifications from Simple English Wikipedia edit histories, aiming to improve automatic simplification resources by identifying high-quality simplifications without manual annotation.
Contribution
It introduces two novel unsupervised approaches leveraging edit histories and metadata to identify lexical simplifications, outperforming baseline methods.
Findings
Methods outperform baseline in extracting simplifications
Generated list includes many high-quality, previously unlisted simplifications
Approaches effectively utilize edit probabilities and metadata
Abstract
We report on work in progress on extracting lexical simplifications (e.g., "collaborate" -> "work together"), focusing on utilizing edit histories in Simple English Wikipedia for this task. We consider two main approaches: (1) deriving simplification probabilities via an edit model that accounts for a mixture of different operations, and (2) using metadata to focus on edits that are more likely to be simplification operations. We find our methods to outperform a reasonable baseline and yield many high-quality lexical simplifications not included in an independently-created manually prepared list.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText Readability and Simplification · Natural Language Processing Techniques · Topic Modeling
