Order parameters and phase transitions of continual learning in deep   neural networks

Haozhe Shan; Qianyi Li; Haim Sompolinsky

arXiv:2407.10315·cs.LG·January 28, 2025·1 cites

Order parameters and phase transitions of continual learning in deep neural networks

Haozhe Shan, Qianyi Li, Haim Sompolinsky

PDF

Open Access

TL;DR

This paper develops a statistical-mechanics theory for continual learning in deep neural networks, identifying key order parameters that predict forgetting and interference, and suggests architectural strategies to mitigate these issues.

Contribution

It introduces a theoretical framework with order parameters that characterize task relations and network architecture effects on continual learning performance.

Findings

01

Increasing network depth reduces task interference.

02

Task similarity predicts phase transitions in learning performance.

03

Low similarity can cause catastrophic forgetting of new tasks.

Abstract

Continual learning (CL) enables animals to learn new tasks without erasing prior knowledge. CL in artificial neural networks (NNs) is challenging due to catastrophic forgetting, where new learning degrades performance on older tasks. While various techniques exist to mitigate forgetting, theoretical insights into when and why CL fails in NNs are lacking. Here, we present a statistical-mechanics theory of CL in deep, wide NNs, which characterizes the network's input-output mapping as it learns a sequence of tasks. It gives rise to order parameters (OPs) that capture how task relations and network architecture influence forgetting and anterograde interference, as verified by numerical evaluations. For networks with a shared readout for all tasks (single-head CL), the relevant-feature and rule similarity between tasks, respectively measured by two OPs, are sufficient to predict a wide…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications