Self-Guided Curriculum Learning for Neural Machine Translation

Lei Zhou; Liang Ding; Kevin Duh; Shinji Watanabe; Ryohei Sasano,; Koichi Takeda

arXiv:2105.04475·cs.CL·August 30, 2021

Self-Guided Curriculum Learning for Neural Machine Translation

Lei Zhou, Liang Ding, Kevin Duh, Shinji Watanabe, Ryohei Sasano,, Koichi Takeda

PDF

Open Access

TL;DR

This paper introduces a self-guided curriculum learning approach for neural machine translation that uses sentence-level BLEU scores to measure learning difficulty, leading to improved translation performance.

Contribution

The proposed method uniquely employs BLEU scores as a difficulty measure, avoiding reliance on linguistic priors or third-party models, and enhances NMT training.

Findings

01

Consistent performance improvements on WMT benchmarks.

02

Effective learning difficulty measurement via BLEU scores.

03

Outperforms strong Transformer baselines.

Abstract

In the field of machine learning, the well-trained model is assumed to be able to recover the training labels, i.e. the synthetic labels predicted by the model should be as close to the ground-truth labels as possible. Inspired by this, we propose a self-guided curriculum strategy to encourage the learning of neural machine translation (NMT) models to follow the above recovery criterion, where we cast the recovery degree of each training example as its learning difficulty. Specifically, we adopt the sentence level BLEU score as the proxy of recovery degree. Different from existing curricula relying on linguistic prior knowledge or third-party language models, our chosen learning difficulty is more suitable to measure the degree of knowledge mastery of the NMT models. Experiments on translation benchmarks, including WMT14 English $\Rightarrow$ German and WMT17 Chinese $\Rightarrow$ English,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Dropout · Softmax · Layer Normalization · Label Smoothing · Byte Pair Encoding