Handling Syntactic Divergence in Low-resource Machine Translation
Chunting Zhou, Xuezhe Ma, Junjie Hu, Graham Neubig

TL;DR
This paper introduces a reordering technique for target sentences to improve neural machine translation in extremely low-resource, syntactically divergent language pairs, outperforming existing semi-supervised methods.
Contribution
It proposes a simple reordering method for target sentences to better align syntax, enhancing low-resource NMT beyond traditional back-translation approaches.
Findings
Significant improvements in Japanese-English translation
Effective in real low-resource Uyghur-English scenarios
Outperforms other semi-supervised methods
Abstract
Despite impressive empirical successes of neural machine translation (NMT) on standard benchmarks, limited parallel data impedes the application of NMT models to many language pairs. Data augmentation methods such as back-translation make it possible to use monolingual data to help alleviate these issues, but back-translation itself fails in extreme low-resource scenarios, especially for syntactically divergent languages. In this paper, we propose a simple yet effective solution, whereby target-language sentences are re-ordered to match the order of the source and used as an additional source of training-time supervision. Experiments with simulated low-resource Japanese-to-English, and real low-resource Uyghur-to-English scenarios find significant improvements over other semi-supervised alternatives.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
