Efficient Continual Learning for Small Language Models with a Discrete Key-Value Bottleneck

Andor Diera; Lukas Galke; Fabian Karl; Ansgar Scherp

arXiv:2412.08528·cs.CL·March 10, 2026

Efficient Continual Learning for Small Language Models with a Discrete Key-Value Bottleneck

Andor Diera, Lukas Galke, Fabian Karl, Ansgar Scherp

PDF

Open Access

TL;DR

This paper presents a discrete key-value bottleneck (DKVB) approach for encoder-only language models that enables efficient continual learning by reducing catastrophic forgetting and lowering computational costs.

Contribution

The paper introduces a novel DKVB architecture for NLP, including a task-independent initialization technique, and demonstrates its effectiveness across multiple continual learning scenarios.

Findings

01

DKVB alleviates catastrophic forgetting in NLP models.

02

DKVB achieves competitive performance with lower computational costs.

03

DKVB remains effective without task IDs in challenging scenarios.

Abstract

Continual learning remains a challenge across various natural language processing (NLP) tasks, as models updated with new training data often risk catastrophic forgetting of previously acquired knowledge. We introduce a discrete key-value bottleneck (DKVB) for encoder-only language models, enabling efficient continual learning through localized updates. Inspired by a discrete key-value bottleneck in vision, we consider new and NLP-specific challenges. We compare different bottleneck architectures for NLP and introduce a new, task-independent initialization technique for the discrete keys. We evaluate our DKVB for NLP in four continual learning scenarios and show that it alleviates catastrophic forgetting. Our experiments demonstrate that the proposed approach achieves competitive performance compared to popular continual learning methods while incurring lower computational costs.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech Recognition and Synthesis · Natural Language Processing Techniques