TL;DR
This paper introduces an efficient continual learning method that regularizes internal embeddings of neural networks, reducing memory and computational costs while maintaining performance, and explores the impact of activation functions on forgetting.
Contribution
It proposes a novel embedding regularization technique combined with a dynamic sampling strategy for scalable continual learning.
Findings
Outperforms state-of-the-art methods in accuracy
Requires less memory and computational time
Analyzes the impact of activation functions on catastrophic forgetting
Abstract
Continual learning of deep neural networks is a key requirement for scaling them up to more complex applicative scenarios and for achieving real lifelong learning of these architectures. Previous approaches to the problem have considered either the progressive increase in the size of the networks, or have tried to regularize the network behavior to equalize it with respect to previously observed tasks. In the latter case, it is essential to understand what type of information best represents this past behavior. Common techniques include regularizing the past outputs, gradients, or individual weights. In this work, we propose a new, relatively simple and efficient method to perform continual learning by regularizing instead the network internal embeddings. To make the approach scalable, we also propose a dynamic sampling strategy to reduce the memory footprint of the required external…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
