Discrete Key-Value Bottleneck

Frederik Tr\"auble; Anirudh Goyal; Nasim Rahaman; Michael Mozer; Kenji; Kawaguchi; Yoshua Bengio; Bernhard Sch\"olkopf

arXiv:2207.11240·cs.LG·June 13, 2023·5 cites

Discrete Key-Value Bottleneck

Frederik Tr\"auble, Anirudh Goyal, Nasim Rahaman, Michael Mozer, Kenji, Kawaguchi, Yoshua Bengio, Bernhard Sch\"olkopf

PDF

Open Access 1 Repo

TL;DR

This paper introduces a discrete key-value bottleneck architecture for continual learning, which reduces catastrophic forgetting by enabling sparse, context-dependent updates and reusing learned representations, validated through theoretical analysis and empirical experiments.

Contribution

The work proposes a novel discrete bottleneck model with key-value codes that mitigates forgetting in continual learning without requiring task boundaries.

Findings

01

Reduces catastrophic forgetting in class-incremental learning scenarios.

02

Outperforms relevant baselines across various pre-trained models.

03

Theoretically shows reduced hypothesis complexity under distribution shifts.

Abstract

Deep neural networks perform well on classification tasks where data streams are i.i.d. and labeled data is abundant. Challenges emerge with non-stationary training data streams such as continual learning. One powerful approach that has addressed this challenge involves pre-training of large encoders on volumes of readily available data, followed by task-specific tuning. Given a new task, however, updating the weights of these encoders is challenging as a large number of weights needs to be fine-tuned, and as a result, they forget information about the previous tasks. In the present work, we propose a model architecture to address this issue, building upon a discrete bottleneck containing pairs of separate and learnable key-value codes. Our paradigm will be to encode; process the representation via a discrete bottleneck; and decode. Here, the input is fed to the pre-trained encoder, the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ftraeuble/experiments_discrete_key_value_bottleneck
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Data Stream Mining Techniques · Machine Learning and Data Classification