Continual Learning in Deep Networks: an Analysis of the Last Layer

Timoth\'ee Lesort; Thomas George; Irina Rish

arXiv:2106.01834·cs.LG·August 19, 2022·1 cites

Continual Learning in Deep Networks: an Analysis of the Last Layer

Timoth\'ee Lesort, Thomas George, Irina Rish

PDF

Open Access

TL;DR

This paper investigates how different output layer parameterizations in deep neural networks influence learning and forgetting in continual learning, proposing solutions that improve performance depending on data distribution changes.

Contribution

It provides a detailed analysis of output layer effects on catastrophic forgetting and evaluates parameterization strategies that enhance continual learning without extra algorithms.

Findings

01

Changing output layer parameterization can mitigate forgetting.

02

Performance depends on data distribution drifts.

03

Standard SGD with modified output layers can outperform traditional methods.

Abstract

We study how different output layer parameterizations of a deep neural network affects learning and forgetting in continual learning settings. The following three effects can cause catastrophic forgetting in the output layer: (1) weights modifications, (2) interference, and (3) projection drift. In this paper, our goal is to provide more insights into how changing the output layer parameterization may address (1) and (2). Some potential solutions to those issues are proposed and evaluated here in several continual learning scenarios. We show that the best-performing type of output layer depends on the data distribution drifts and/or the amount of data available. In particular, in some cases where a standard linear layer would fail, changing parameterization is sufficient to achieve a significantly better performance, without introducing any continual-learning algorithm but instead by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Machine Learning and ELM

MethodsLinear Layer · Stochastic Gradient Descent