On the Diminishing Returns of Width for Continual Learning

Etash Guha; Vihan Lakshman

arXiv:2403.06398·cs.LG·June 21, 2024·1 cites

On the Diminishing Returns of Width for Continual Learning

Etash Guha, Vihan Lakshman

PDF

Open Access 1 Repo

TL;DR

This paper analyzes how increasing neural network width affects catastrophic forgetting in continual learning, showing that wider networks reduce forgetting but with diminishing returns, supported by theoretical and empirical evidence.

Contribution

It provides one of the first theoretical frameworks linking network width to forgetting in feed-forward networks, revealing diminishing returns with increased width.

Findings

01

Wider networks reduce catastrophic forgetting.

02

Diminishing returns observed as width increases.

03

Empirical results confirm theoretical predictions.

Abstract

While deep neural networks have demonstrated groundbreaking performance in various settings, these models often suffer from \emph{catastrophic forgetting} when trained on new tasks in sequence. Several works have empirically demonstrated that increasing the width of a neural network leads to a decrease in catastrophic forgetting but have yet to characterize the exact relationship between width and continual learning. We design one of the first frameworks to analyze Continual Learning Theory and prove that width is directly related to forgetting in Feed-Forward Networks (FFN). Specifically, we demonstrate that increasing network widths to reduce forgetting yields diminishing returns. We empirically verify our claims at widths hitherto unexplored in prior studies where the diminishing returns are clearly observed as predicted by our theory.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vihan-lakshman/diminishing-returns-wide-continual-learning
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Domain Adaptation and Few-Shot Learning · Sparse and Compressive Sensing Techniques