Handling Syntactic Divergence in Low-resource Machine Translation

Chunting Zhou; Xuezhe Ma; Junjie Hu; Graham Neubig

arXiv:1909.00040·cs.CL·October 8, 2019·1 cites

Handling Syntactic Divergence in Low-resource Machine Translation

Chunting Zhou, Xuezhe Ma, Junjie Hu, Graham Neubig

PDF

Open Access 1 Repo

TL;DR

This paper introduces a reordering technique for target sentences to improve neural machine translation in extremely low-resource, syntactically divergent language pairs, outperforming existing semi-supervised methods.

Contribution

It proposes a simple reordering method for target sentences to better align syntax, enhancing low-resource NMT beyond traditional back-translation approaches.

Findings

01

Significant improvements in Japanese-English translation

02

Effective in real low-resource Uyghur-English scenarios

03

Outperforms other semi-supervised methods

Abstract

Despite impressive empirical successes of neural machine translation (NMT) on standard benchmarks, limited parallel data impedes the application of NMT models to many language pairs. Data augmentation methods such as back-translation make it possible to use monolingual data to help alleviate these issues, but back-translation itself fails in extreme low-resource scenarios, especially for syntactically divergent languages. In this paper, we propose a simple yet effective solution, whereby target-language sentences are re-ordered to match the order of the source and used as an additional source of training-time supervision. Experiments with simulated low-resource Japanese-to-English, and real low-resource Uyghur-to-English scenarios find significant improvements over other semi-supervised alternatives.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

violet-zct/pytorch-reorder-nmt
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications