Statistical Mechanical Analysis of Catastrophic Forgetting in Continual Learning with Teacher and Student Networks
Haruka Asanuma, Shiro Takagi, Yoshihiro Nagano, Yuki Yoshida, Yasuhiko, Igarashi, and Masato Okada

TL;DR
This paper develops a theoretical framework using teacher-student networks to analyze catastrophic forgetting in continual learning, revealing conditions that mitigate forgetting and phenomena like overshoot.
Contribution
It introduces a novel statistical mechanical analysis of catastrophic forgetting within the teacher-student learning paradigm, providing qualitative insights into task similarity effects.
Findings
Network avoids forgetting when task similarity is low.
Large similarity in input-output functions reduces forgetting.
Overshoot phenomenon allows recovery after initial forgetting.
Abstract
When a computational system continuously learns from an ever-changing environment, it rapidly forgets its past experiences. This phenomenon is called catastrophic forgetting. While a line of studies has been proposed with respect to avoiding catastrophic forgetting, most of the methods are based on intuitive insights into the phenomenon, and their performances have been evaluated by numerical experiments using benchmark datasets. Therefore, in this study, we provide the theoretical framework for analyzing catastrophic forgetting by using teacher-student learning. Teacher-student learning is a framework in which we introduce two neural networks: one neural network is a target function in supervised learning, and the other is a learning neural network. To analyze continual learning in the teacher-student framework, we introduce the similarity of the input distribution and the input-output…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Machine Learning and ELM
