Memory-augmented Chinese-Uyghur Neural Machine Translation

Shiyue Zhang; Gulnigar Mahmut; Dong Wang; Askar Hamdulla

arXiv:1706.08683·cs.CL·June 28, 2017·2 cites

Memory-augmented Chinese-Uyghur Neural Machine Translation

Shiyue Zhang, Gulnigar Mahmut, Dong Wang, Askar Hamdulla

PDF

Open Access

TL;DR

This paper introduces a memory-augmented neural machine translation model for Chinese-Uyghur translation, demonstrating improved performance and handling of rare words with a novel memory structure on a mid-sized dataset.

Contribution

It proposes a novel memory structure to enhance NMT for low-resource, agglutinative language translation, outperforming traditional NMT and SMT methods.

Findings

01

Memory-augmented NMT outperforms vanilla NMT and SMT.

02

Memory structure effectively handles out-of-vocabulary words.

03

Mid-scale dataset enables high-quality Chinese-Uyghur translation.

Abstract

Neural machine translation (NMT) has achieved notable performance recently. However, this approach has not been widely applied to the translation task between Chinese and Uyghur, partly due to the limited parallel data resource and the large proportion of rare words caused by the agglutinative nature of Uyghur. In this paper, we collect ~200,000 sentence pairs and show that with this middle-scale database, an attention-based NMT can perform very well on Chinese-Uyghur/Uyghur-Chinese translation. To tackle rare words, we propose a novel memory structure to assist the NMT inference. Our experiments demonstrated that the memory-augmented NMT (M-NMT) outperforms both the vanilla NMT and the phrase-based statistical machine translation (SMT). Interestingly, the memory structure provides an elegant way for dealing with words that are out of vocabulary.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Cancer-related molecular mechanisms research · Topic Modeling