Discrete Key-Value Bottleneck
Frederik Tr\"auble, Anirudh Goyal, Nasim Rahaman, Michael Mozer, Kenji, Kawaguchi, Yoshua Bengio, Bernhard Sch\"olkopf

TL;DR
This paper introduces a discrete key-value bottleneck architecture for continual learning, which reduces catastrophic forgetting by enabling sparse, context-dependent updates and reusing learned representations, validated through theoretical analysis and empirical experiments.
Contribution
The work proposes a novel discrete bottleneck model with key-value codes that mitigates forgetting in continual learning without requiring task boundaries.
Findings
Reduces catastrophic forgetting in class-incremental learning scenarios.
Outperforms relevant baselines across various pre-trained models.
Theoretically shows reduced hypothesis complexity under distribution shifts.
Abstract
Deep neural networks perform well on classification tasks where data streams are i.i.d. and labeled data is abundant. Challenges emerge with non-stationary training data streams such as continual learning. One powerful approach that has addressed this challenge involves pre-training of large encoders on volumes of readily available data, followed by task-specific tuning. Given a new task, however, updating the weights of these encoders is challenging as a large number of weights needs to be fine-tuned, and as a result, they forget information about the previous tasks. In the present work, we propose a model architecture to address this issue, building upon a discrete bottleneck containing pairs of separate and learnable key-value codes. Our paradigm will be to encode; process the representation via a discrete bottleneck; and decode. Here, the input is fed to the pre-trained encoder, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Data Stream Mining Techniques · Machine Learning and Data Classification
