Overcoming Catastrophic Forgetting beyond Continual Learning: Balanced   Training for Neural Machine Translation

Chenze Shao; Yang Feng

arXiv:2203.03910·cs.CL·March 21, 2022

Overcoming Catastrophic Forgetting beyond Continual Learning: Balanced Training for Neural Machine Translation

Chenze Shao, Yang Feng

PDF

1 Repo

TL;DR

This paper identifies imbalanced training as a cause of catastrophic forgetting in neural machine translation, even in static training, and proposes a method called COKD to mitigate this issue, leading to improved translation performance.

Contribution

It introduces the concept of imbalanced training as a cause of catastrophic forgetting in static neural network training and proposes COKD, a novel knowledge distillation approach to address it.

Findings

01

COKD effectively alleviates imbalanced training.

02

Experimental results show substantial improvements in translation quality.

03

The method outperforms strong baseline systems across multiple tasks.

Abstract

Neural networks tend to gradually forget the previously learned knowledge when learning multiple tasks sequentially from dynamic data distributions. This problem is called \textit{catastrophic forgetting}, which is a fundamental challenge in the continual learning of neural networks. In this work, we observe that catastrophic forgetting not only occurs in continual learning but also affects the traditional static training. Neural networks, especially neural machine translation models, suffer from catastrophic forgetting even if they learn from a static training set. To be specific, the final model pays imbalanced attention to training samples, where recently exposed samples attract more attention than earlier samples. The underlying cause is that training samples do not get balanced training in each model update, so we name this problem \textit{imbalanced training}. To alleviate this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ictnlp/cokd
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsKnowledge Distillation