Confidence-Aware Scheduled Sampling for Neural Machine Translation

Yijin Liu; Fandong Meng; Yufeng Chen; Jinan Xu; Jie Zhou

arXiv:2107.10427·cs.CL·July 23, 2021

Confidence-Aware Scheduled Sampling for Neural Machine Translation

Yijin Liu, Fandong Meng, Yufeng Chen, Jinan Xu, Jie Zhou

PDF

Open Access 1 Repo

TL;DR

This paper introduces confidence-aware scheduled sampling for neural machine translation, which dynamically adjusts training exposure based on model confidence, leading to improved translation quality and faster convergence.

Contribution

It proposes a novel confidence-based schedule strategy for scheduled sampling, enhancing performance and training efficiency in neural machine translation.

Findings

01

Significantly outperforms vanilla scheduled sampling in translation quality.

02

Achieves faster convergence during training.

03

Effective across multiple language pairs and Transformer models.

Abstract

Scheduled sampling is an effective method to alleviate the exposure bias problem of neural machine translation. It simulates the inference scene by randomly replacing ground-truth target input tokens with predicted ones during training. Despite its success, its critical schedule strategies are merely based on training steps, ignoring the real-time model competence, which limits its potential performance and convergence speed. To address this issue, we propose confidence-aware scheduled sampling. Specifically, we quantify real-time model competence by the confidence of model predictions, based on which we design fine-grained schedule strategies. In this way, the model is exactly exposed to predicted tokens for high-confidence positions and still ground-truth tokens for low-confidence positions. Moreover, we observe vanilla scheduled sampling suffers from degenerating into the original…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Adaxry/conf_aware_ss4nmt
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications

MethodsAttention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Label Smoothing · Residual Connection · Softmax · Dense Connections · Adam · Layer Normalization