Order parameters and phase transitions of continual learning in deep neural networks
Haozhe Shan, Qianyi Li, Haim Sompolinsky

TL;DR
This paper develops a statistical-mechanics theory for continual learning in deep neural networks, identifying key order parameters that predict forgetting and interference, and suggests architectural strategies to mitigate these issues.
Contribution
It introduces a theoretical framework with order parameters that characterize task relations and network architecture effects on continual learning performance.
Findings
Increasing network depth reduces task interference.
Task similarity predicts phase transitions in learning performance.
Low similarity can cause catastrophic forgetting of new tasks.
Abstract
Continual learning (CL) enables animals to learn new tasks without erasing prior knowledge. CL in artificial neural networks (NNs) is challenging due to catastrophic forgetting, where new learning degrades performance on older tasks. While various techniques exist to mitigate forgetting, theoretical insights into when and why CL fails in NNs are lacking. Here, we present a statistical-mechanics theory of CL in deep, wide NNs, which characterizes the network's input-output mapping as it learns a sequence of tasks. It gives rise to order parameters (OPs) that capture how task relations and network architecture influence forgetting and anterograde interference, as verified by numerical evaluations. For networks with a shared readout for all tasks (single-head CL), the relevant-feature and rule similarity between tasks, respectively measured by two OPs, are sufficient to predict a wide…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
