Does the Adam Optimizer Exacerbate Catastrophic Forgetting?
Dylan R. Ashley, Sina Ghiassian, Richard S. Sutton

TL;DR
This paper investigates how different optimization algorithms, especially Adam versus SGD, influence catastrophic forgetting in neural networks, revealing that classical methods sometimes outperform modern ones and emphasizing the need for rigorous measurement metrics.
Contribution
It provides empirical evidence that optimizer choice significantly affects catastrophic forgetting and highlights the importance of using multiple metrics for accurate assessment.
Findings
Classical SGD can cause less forgetting than Adam in some cases.
The choice of forgetting metrics dramatically influences study conclusions.
A comprehensive evaluation requires multiple, concurrent metrics.
Abstract
Catastrophic forgetting remains a severe hindrance to the broad application of artificial neural networks (ANNs), however, it continues to be a poorly understood phenomenon. Despite the extensive amount of work on catastrophic forgetting, we argue that it is still unclear how exactly the phenomenon should be quantified, and, moreover, to what degree all of the choices we make when designing learning systems affect the amount of catastrophic forgetting. We use various testbeds from the reinforcement learning and supervised learning literature to (1) provide evidence that the choice of which modern gradient-based optimization algorithm is used to train an ANN has a significant impact on the amount of catastrophic forgetting and show that-surprisingly-in many instances classical algorithms such as vanilla SGD experience less catastrophic forgetting than the more modern algorithms such as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Reinforcement Learning in Robotics · Artificial Intelligence in Games
MethodsAdam · Stochastic Gradient Descent
