For the sake of simplicity: Unsupervised extraction of lexical   simplifications from Wikipedia

Mark Yatskar; Bo Pang; Cristian Danescu-Niculescu-Mizil; Lillian; Lee

arXiv:1008.1986·cs.CL·August 13, 2010·131 cites

For the sake of simplicity: Unsupervised extraction of lexical simplifications from Wikipedia

Mark Yatskar, Bo Pang, Cristian Danescu-Niculescu-Mizil, Lillian, Lee

PDF

Open Access

TL;DR

This paper explores unsupervised methods for extracting lexical simplifications from Simple English Wikipedia edit histories, aiming to improve automatic simplification resources by identifying high-quality simplifications without manual annotation.

Contribution

It introduces two novel unsupervised approaches leveraging edit histories and metadata to identify lexical simplifications, outperforming baseline methods.

Findings

01

Methods outperform baseline in extracting simplifications

02

Generated list includes many high-quality, previously unlisted simplifications

03

Approaches effectively utilize edit probabilities and metadata

Abstract

We report on work in progress on extracting lexical simplifications (e.g., "collaborate" -> "work together"), focusing on utilizing edit histories in Simple English Wikipedia for this task. We consider two main approaches: (1) deriving simplification probabilities via an edit model that accounts for a mixture of different operations, and (2) using metadata to focus on edits that are more likely to be simplification operations. We find our methods to outperform a reasonable baseline and yield many high-quality lexical simplifications not included in an independently-created manually prepared list.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText Readability and Simplification · Natural Language Processing Techniques · Topic Modeling