Unsupervised Neural Text Simplification

Sai Surya; Abhijit Mishra; Anirban Laha; Parag Jain; Karthik; Sankaranarayanan

arXiv:1810.07931·cs.CL·August 22, 2019·1 cites

Unsupervised Neural Text Simplification

Sai Surya, Abhijit Mishra, Anirban Laha, Parag Jain, Karthik, Sankaranarayanan

PDF

Open Access 1 Repo

TL;DR

This paper introduces an unsupervised neural model for text simplification that learns from unlabeled data, achieving competitive results with some improvements when minimal labeled data is added.

Contribution

It presents the first unsupervised neural approach to text simplification using only unlabeled corpora, combining shared encoders and attentional decoders with discrimination and denoising techniques.

Findings

01

Model performs well at lexical and syntactic simplification

02

Competitive with supervised methods on public benchmarks

03

Adding a few labeled pairs enhances performance

Abstract

The paper presents a first attempt towards unsupervised neural text simplification that relies only on unlabeled text corpora. The core framework is composed of a shared encoder and a pair of attentional-decoders and gains knowledge of simplification through discrimination based-losses and denoising. The framework is trained using unlabeled text collected from en-Wikipedia dump. Our analysis (both quantitative and qualitative involving human evaluators) on a public test data shows that the proposed model can perform text-simplification at both lexical and syntactic levels, competitive to existing supervised methods. Addition of a few labelled pairs also improves the performance further.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

subramanyamdvss/UnsupNTS
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText Readability and Simplification · Natural Language Processing Techniques · Topic Modeling