Improving Neural Text Simplification Model with Simplified Corpora
Jipeng Qiang

TL;DR
This paper introduces a method to improve neural text simplification by using synthetic sentence pairs generated through back-translation, enhancing model performance without changing the architecture.
Contribution
It proposes a novel data augmentation technique using synthetic sentence pairs to boost neural text simplification performance.
Findings
Significant improvements on WikiLarge and WikiSmall datasets
Synthetic data enhances fluency and simplification quality
Method outperforms state-of-the-art approaches
Abstract
Text simplification (TS) can be viewed as monolingual translation task, translating between text variations within a single language. Recent neural TS models draw on insights from neural machine translation to learn lexical simplification and content reduction using encoder-decoder model. But different from neural machine translation, we cannot obtain enough ordinary and simplified sentence pairs for TS, which are expensive and time-consuming to build. Target-side simplified sentences plays an important role in boosting fluency for statistical TS, and we investigate the use of simplified sentences to train, with no changes to the network architecture. We propose to pair simple training sentence with a synthetic ordinary sentence via back-translation, and treating this synthetic data as additional training data. We train encoder-decoder model using synthetic sentence pairs and original…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText Readability and Simplification · Natural Language Processing Techniques · Topic Modeling
MethodsSpatio-temporal stability analysis
