Active Learning for Massively Parallel Translation of Constrained Text into Low Resource Languages
Zhong Zhou, Alex Waibel

TL;DR
This paper explores active learning strategies for translating a known, constrained text into low-resource languages, demonstrating that random sampling outperforms portion-based methods and proposing an effective iterative model update algorithm.
Contribution
It introduces a comparison of sampling strategies for low-resource translation and proposes a novel iterative update algorithm for improving translation quality.
Findings
Random sampling yields +11.0 BLEU over portion-based methods.
Adding post-edited data after vocabulary update performs best.
The proposed algorithm enables seamless human-machine collaboration.
Abstract
We translate a closed text that is known in advance and available in many languages into a new and severely low resource language. Most human translation efforts adopt a portion-based approach to translate consecutive pages/chapters in order, which may not suit machine translation. We compare the portion-based approach that optimizes coherence of the text locally with the random sampling approach that increases coverage of the text globally. Our results show that the random sampling approach performs better. When training on a seed corpus of ~1,000 lines from the Bible and testing on the rest of the Bible (~30,000 lines), random sampling gives a performance gain of +11.0 BLEU using English as a simulated low resource language, and +4.9 BLEU using Eastern Pokomchi, a Mayan language. Furthermore, we compare three ways of updating machine translation models with increasing amount of human…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
