Synthetic Source Language Augmentation for Colloquial Neural Machine   Translation

Asrul Sani Ariesandy; Mukhlis Amien; Alham Fikri Aji; Radityo Eko; Prasojo

arXiv:2012.15178·cs.CL·January 1, 2021

Synthetic Source Language Augmentation for Colloquial Neural Machine Translation

Asrul Sani Ariesandy, Mukhlis Amien, Alham Fikri Aji, Radityo Eko, Prasojo

PDF

Open Access

TL;DR

This paper introduces a synthetic style augmentation method for NMT that enhances translation of colloquial Indonesian by creating a new test set and improving model performance on informal language.

Contribution

The work develops a novel colloquial Indonesian-English test set and demonstrates that synthetic style augmentation of formal source data improves NMT performance on colloquial language.

Findings

01

Improved BLEU scores on colloquial Indonesian-English translation

02

Created a new colloquial Indonesian-English test set from YouTube and Twitter

03

Synthetic style augmentation benefits NMT in handling informal language

Abstract

Neural machine translation (NMT) is typically domain-dependent and style-dependent, and it requires lots of training data. State-of-the-art NMT models often fall short in handling colloquial variations of its source language and the lack of parallel data in this regard is a challenging hurdle in systematically improving the existing models. In this work, we develop a novel colloquial Indonesian-English test-set collected from YouTube transcript and Twitter. We perform synthetic style augmentation to the source of formal Indonesian language and show that it improves the baseline Id-En models (in BLEU) over the new test data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications