Confidence-Aware Scheduled Sampling for Neural Machine Translation
Yijin Liu, Fandong Meng, Yufeng Chen, Jinan Xu, Jie Zhou

TL;DR
This paper introduces confidence-aware scheduled sampling for neural machine translation, which dynamically adjusts training exposure based on model confidence, leading to improved translation quality and faster convergence.
Contribution
It proposes a novel confidence-based schedule strategy for scheduled sampling, enhancing performance and training efficiency in neural machine translation.
Findings
Significantly outperforms vanilla scheduled sampling in translation quality.
Achieves faster convergence during training.
Effective across multiple language pairs and Transformer models.
Abstract
Scheduled sampling is an effective method to alleviate the exposure bias problem of neural machine translation. It simulates the inference scene by randomly replacing ground-truth target input tokens with predicted ones during training. Despite its success, its critical schedule strategies are merely based on training steps, ignoring the real-time model competence, which limits its potential performance and convergence speed. To address this issue, we propose confidence-aware scheduled sampling. Specifically, we quantify real-time model competence by the confidence of model predictions, based on which we design fine-grained schedule strategies. In this way, the model is exactly exposed to predicted tokens for high-confidence positions and still ground-truth tokens for low-confidence positions. Moreover, we observe vanilla scheduled sampling suffers from degenerating into the original…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
MethodsAttention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Label Smoothing · Residual Connection · Softmax · Dense Connections · Adam · Layer Normalization
