Improving Neural Text Simplification Model with Simplified Corpora

Jipeng Qiang

arXiv:1810.04428·cs.CL·October 11, 2018·6 cites

Improving Neural Text Simplification Model with Simplified Corpora

Jipeng Qiang

PDF

Open Access

TL;DR

This paper introduces a method to improve neural text simplification by using synthetic sentence pairs generated through back-translation, enhancing model performance without changing the architecture.

Contribution

It proposes a novel data augmentation technique using synthetic sentence pairs to boost neural text simplification performance.

Findings

01

Significant improvements on WikiLarge and WikiSmall datasets

02

Synthetic data enhances fluency and simplification quality

03

Method outperforms state-of-the-art approaches

Abstract

Text simplification (TS) can be viewed as monolingual translation task, translating between text variations within a single language. Recent neural TS models draw on insights from neural machine translation to learn lexical simplification and content reduction using encoder-decoder model. But different from neural machine translation, we cannot obtain enough ordinary and simplified sentence pairs for TS, which are expensive and time-consuming to build. Target-side simplified sentences plays an important role in boosting fluency for statistical TS, and we investigate the use of simplified sentences to train, with no changes to the network architecture. We propose to pair simple training sentence with a synthetic ordinary sentence via back-translation, and treating this synthetic data as additional training data. We train encoder-decoder model using synthetic sentence pairs and original…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText Readability and Simplification · Natural Language Processing Techniques · Topic Modeling

MethodsSpatio-temporal stability analysis