An Empirical Investigation of Catastrophic Forgetting in Gradient-Based Neural Networks
Ian J. Goodfellow, Mehdi Mirza, Da Xiao, Aaron Courville, Yoshua, Bengio

TL;DR
This paper empirically studies catastrophic forgetting in neural networks, comparing training algorithms and activation functions, and finds dropout consistently best for balancing old and new task performance.
Contribution
It provides a comprehensive empirical analysis of catastrophic forgetting, highlighting the effectiveness of dropout and the importance of cross-validating activation functions.
Findings
Dropout outperforms other algorithms in balancing task retention and adaptation.
Task relationships significantly influence activation function performance.
Activation function choice should be cross-validated for different tasks.
Abstract
Catastrophic forgetting is a problem faced by many machine learning models and algorithms. When trained on one task, then trained on a second task, many machine learning models "forget" how to perform the first task. This is widely believed to be a serious problem for neural networks. Here, we investigate the extent to which the catastrophic forgetting problem occurs for modern neural networks, comparing both established and recent gradient-based training algorithms and activation functions. We also examine the effect of the relationship between the first task and the second task on catastrophic forgetting. We find that it is always best to train using the dropout algorithm--the dropout algorithm is consistently best at adapting to the new task, remembering the old task, and has the best tradeoff curve between these two extremes. We find that different tasks and relationships between…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Advanced Neural Network Applications
