On the Diminishing Returns of Width for Continual Learning
Etash Guha, Vihan Lakshman

TL;DR
This paper analyzes how increasing neural network width affects catastrophic forgetting in continual learning, showing that wider networks reduce forgetting but with diminishing returns, supported by theoretical and empirical evidence.
Contribution
It provides one of the first theoretical frameworks linking network width to forgetting in feed-forward networks, revealing diminishing returns with increased width.
Findings
Wider networks reduce catastrophic forgetting.
Diminishing returns observed as width increases.
Empirical results confirm theoretical predictions.
Abstract
While deep neural networks have demonstrated groundbreaking performance in various settings, these models often suffer from \emph{catastrophic forgetting} when trained on new tasks in sequence. Several works have empirically demonstrated that increasing the width of a neural network leads to a decrease in catastrophic forgetting but have yet to characterize the exact relationship between width and continual learning. We design one of the first frameworks to analyze Continual Learning Theory and prove that width is directly related to forgetting in Feed-Forward Networks (FFN). Specifically, we demonstrate that increasing network widths to reduce forgetting yields diminishing returns. We empirically verify our claims at widths hitherto unexplored in prior studies where the diminishing returns are clearly observed as predicted by our theory.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Domain Adaptation and Few-Shot Learning · Sparse and Compressive Sensing Techniques
