The Effect of Architecture During Continual Learning

Allyson Hahn; Krishnan Raghavan

arXiv:2601.19766·cs.LG·January 28, 2026

The Effect of Architecture During Continual Learning

Allyson Hahn, Krishnan Raghavan

PDF

Open Access

TL;DR

This paper introduces a mathematical framework and optimization approach for jointly learning neural network architecture and weights during continual learning, significantly reducing catastrophic forgetting and improving performance.

Contribution

It develops a Sobolev space-based framework and a bilevel optimization method with a derivative-free search and low-rank transfer to adapt architecture and weights simultaneously.

Findings

01

Joint architecture and weight learning reduces forgetting.

02

Empirical results show up to 100x performance improvement.

03

Framework applies across various neural network types.

Abstract

Continual learning is a challenge for models with static architecture, as they fail to adapt to when data distributions evolve across tasks. We introduce a mathematical framework that jointly models architecture and weights in a Sobolev space, enabling a rigorous investigation into the role of neural network architecture in continual learning and its effect on the forgetting loss. We derive necessary conditions for the continual learning solution and prove that learning only model weights is insufficient to mitigate catastrophic forgetting under distribution shifts. Consequently, we prove that by learning the architecture and weights simultaneously at each task, we can reduce catastrophic forgetting. To learn weights and architecture simultaneously, we formulate continual learning as a bilevel optimization problem: the upper level selects an optimal architecture for a given task,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Graph Neural Networks · Generative Adversarial Networks and Image Synthesis