Simplifying the Bible and Wikipedia Using Statistical Machine   Translation

Yohan Jo

arXiv:1703.08646·cs.CL·March 28, 2017

Simplifying the Bible and Wikipedia Using Statistical Machine Translation

Yohan Jo

PDF

Open Access

TL;DR

This paper explores using statistical machine translation to simplify complex texts like the Bible and Wikipedia, demonstrating how SMT components influence the quality of generated simplified versions.

Contribution

It applies SMT techniques to text simplification tasks on religious and encyclopedic texts, highlighting the impact of different SMT components on output quality.

Findings

01

Phrase translation, language model, and recording significantly affect simplification quality.

02

Adjusting SMT component weights alters the style and clarity of simplified texts.

03

Examples show successful synthesis of texts into the King James style.

Abstract

I started this work with the hope of generating a text synthesizer (like a musical synthesizer) that can imitate certain linguistic styles. Most of the report focuses on text simplification using statistical machine translation (SMT) techniques. I applied MOSES to a parallel corpus of the Bible (King James Version and Easy-to-Read Version) and that of Wikipedia articles (normal and simplified). I report the importance of the three main components of SMT---phrase translation, language model, and recording---by changing their weights and comparing the resulting quality of simplified text in terms of METEOR and BLEU. Toward the end of the report will be presented some examples of text "synthesized" into the King James style.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Text Readability and Simplification · Topic Modeling