The Effect of Architecture During Continual Learning
Allyson Hahn, Krishnan Raghavan

TL;DR
This paper introduces a mathematical framework and optimization approach for jointly learning neural network architecture and weights during continual learning, significantly reducing catastrophic forgetting and improving performance.
Contribution
It develops a Sobolev space-based framework and a bilevel optimization method with a derivative-free search and low-rank transfer to adapt architecture and weights simultaneously.
Findings
Joint architecture and weight learning reduces forgetting.
Empirical results show up to 100x performance improvement.
Framework applies across various neural network types.
Abstract
Continual learning is a challenge for models with static architecture, as they fail to adapt to when data distributions evolve across tasks. We introduce a mathematical framework that jointly models architecture and weights in a Sobolev space, enabling a rigorous investigation into the role of neural network architecture in continual learning and its effect on the forgetting loss. We derive necessary conditions for the continual learning solution and prove that learning only model weights is insufficient to mitigate catastrophic forgetting under distribution shifts. Consequently, we prove that by learning the architecture and weights simultaneously at each task, we can reduce catastrophic forgetting. To learn weights and architecture simultaneously, we formulate continual learning as a bilevel optimization problem: the upper level selects an optimal architecture for a given task,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Graph Neural Networks · Generative Adversarial Networks and Image Synthesis
