Scheduled Sampling Based on Decoding Steps for Neural Machine   Translation

Yijin Liu; Fandong Meng; Yufeng Chen; Jinan Xu; Jie Zhou

arXiv:2108.12963·cs.CL·September 1, 2021

Scheduled Sampling Based on Decoding Steps for Neural Machine Translation

Yijin Liu, Fandong Meng, Yufeng Chen, Jinan Xu, Jie Zhou

PDF

Open Access 1 Repo

TL;DR

This paper introduces decoding step-based scheduled sampling methods for neural machine translation, which better simulate inference errors during training and improve translation quality across multiple tasks.

Contribution

It proposes novel scheduled sampling techniques based on decoding steps, addressing limitations of traditional methods and enhancing translation performance.

Findings

01

Significant improvements over baseline and vanilla scheduled sampling.

02

Effective across multiple large-scale translation tasks.

03

Generalizes well to text summarization benchmarks.

Abstract

Scheduled sampling is widely used to mitigate the exposure bias problem for neural machine translation. Its core motivation is to simulate the inference scene during training by replacing ground-truth tokens with predicted tokens, thus bridging the gap between training and inference. However, vanilla scheduled sampling is merely based on training steps and equally treats all decoding steps. Namely, it simulates an inference scene with uniform error rates, which disobeys the real inference scene, where larger decoding steps usually have higher error rates due to error accumulations. To alleviate the above discrepancy, we propose scheduled sampling methods based on decoding steps, increasing the selection chance of predicted tokens with the growth of decoding steps. Consequently, we can more realistically simulate the inference scene during training, thus better bridging the gap between…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

adaxry/ss_on_decoding_steps
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications

MethodsAttention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Adam · Dropout · Layer Normalization · Dense Connections · Byte Pair Encoding · Softmax