Improving Scheduled Sampling with Elastic Weight Consolidation for   Neural Machine Translation

Michalis Korakakis; Andreas Vlachos

arXiv:2109.06308·cs.CL·January 11, 2023

Improving Scheduled Sampling with Elastic Weight Consolidation for Neural Machine Translation

Michalis Korakakis, Andreas Vlachos

PDF

Open Access

TL;DR

This paper improves neural machine translation by combining scheduled sampling with Elastic Weight Consolidation to reduce exposure bias and catastrophic forgetting, leading to better translation performance.

Contribution

It introduces a novel method integrating Elastic Weight Consolidation with scheduled sampling to address exposure bias and forgetting in neural machine translation.

Findings

01

Our method outperforms MLE and scheduled sampling baselines.

02

It alleviates catastrophic forgetting in translation models.

03

Significant improvements on IWSLT'14 and WMT'14 datasets.

Abstract

Despite strong performance in many sequence-to-sequence tasks, autoregressive models trained with maximum likelihood estimation suffer from exposure bias, i.e. the discrepancy between the ground-truth prefixes used during training and the model-generated prefixes used at inference time. Scheduled sampling is a simple and empirically successful approach which addresses this issue by incorporating model-generated prefixes into training. However, it has been argued that it is an inconsistent training objective leading to models ignoring the prefixes altogether. In this paper, we conduct systematic experiments and find that scheduled sampling, while it ameliorates exposure bias by increasing model reliance on the input sequence, worsens performance when the prefix at inference time is correct, a form of catastrophic forgetting. We propose to use Elastic Weight Consolidation to better…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications