Improving Scheduled Sampling with Elastic Weight Consolidation for Neural Machine Translation
Michalis Korakakis, Andreas Vlachos

TL;DR
This paper improves neural machine translation by combining scheduled sampling with Elastic Weight Consolidation to reduce exposure bias and catastrophic forgetting, leading to better translation performance.
Contribution
It introduces a novel method integrating Elastic Weight Consolidation with scheduled sampling to address exposure bias and forgetting in neural machine translation.
Findings
Our method outperforms MLE and scheduled sampling baselines.
It alleviates catastrophic forgetting in translation models.
Significant improvements on IWSLT'14 and WMT'14 datasets.
Abstract
Despite strong performance in many sequence-to-sequence tasks, autoregressive models trained with maximum likelihood estimation suffer from exposure bias, i.e. the discrepancy between the ground-truth prefixes used during training and the model-generated prefixes used at inference time. Scheduled sampling is a simple and empirically successful approach which addresses this issue by incorporating model-generated prefixes into training. However, it has been argued that it is an inconsistent training objective leading to models ignoring the prefixes altogether. In this paper, we conduct systematic experiments and find that scheduled sampling, while it ameliorates exposure bias by increasing model reliance on the input sequence, worsens performance when the prefix at inference time is correct, a form of catastrophic forgetting. We propose to use Elastic Weight Consolidation to better…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
