Token-wise Curriculum Learning for Neural Machine Translation
Chen Liang, Haoming Jiang, Xiaodong Liu, Pengcheng He, Weizhu Chen,, Jianfeng Gao, Tuo Zhao

TL;DR
This paper introduces a token-wise curriculum learning method for neural machine translation that improves training efficiency and translation quality, especially for low-resource languages, by gradually expanding target subsequences during training.
Contribution
The paper proposes a novel token-wise curriculum learning approach that creates sufficient easy samples and adapts to low-resource scenarios, outperforming existing methods.
Findings
Outperforms baselines on 5 language pairs.
Especially effective for low-resource languages.
Combining with sentence-level methods further improves results.
Abstract
Existing curriculum learning approaches to Neural Machine Translation (NMT) require sampling sufficient amounts of "easy" samples from training data at the early training stage. This is not always achievable for low-resource languages where the amount of training data is limited. To address such limitation, we propose a novel token-wise curriculum learning approach that creates sufficient amounts of easy samples. Specifically, the model learns to predict a short sub-sequence from the beginning part of each target sentence at the early stage of training, and then the sub-sequence is gradually expanded as the training progresses. Such a new curriculum design is inspired by the cumulative effect of translation errors, which makes the latter tokens more difficult to predict than the beginning ones. Extensive experiments show that our approach can consistently outperform baselines on 5…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
