Diagnosing Catastrophe: Large parts of accuracy loss in continual   learning can be accounted for by readout misalignment

Daniel Anthes; Sushrut Thorat; Peter K\"onig; Tim C.; Kietzmann

arXiv:2310.05644·cs.LG·October 10, 2023

Diagnosing Catastrophe: Large parts of accuracy loss in continual learning can be accounted for by readout misalignment

Daniel Anthes, Sushrut Thorat, Peter K\"onig, Tim C., Kietzmann

PDF

Open Access

TL;DR

This paper investigates the causes of accuracy loss in continual learning, revealing that readout misalignment and representational shifts are major factors, with implications for improving neural network robustness and biological alignment.

Contribution

It identifies readout misalignment as a key factor in catastrophic forgetting and characterizes the representational changes involved.

Findings

01

Readout misalignment accounts for most accuracy loss.

02

Representational geometry is partially conserved despite shifts.

03

Representational changes scale with hidden layer dimensionality.

Abstract

Unlike primates, training artificial neural networks on changing data distributions leads to a rapid decrease in performance on old tasks. This phenomenon is commonly referred to as catastrophic forgetting. In this paper, we investigate the representational changes that underlie this performance decrease and identify three distinct processes that together account for the phenomenon. The largest component is a misalignment between hidden representations and readout layers. Misalignment occurs due to learning on additional tasks and causes internal representations to shift. Representational geometry is partially conserved under this misalignment and only a small part of the information is irrecoverably lost. All types of representational changes scale with the dimensionality of hidden representations. These insights have implications for deep learning applications that need to be…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning