Efficient Continual Learning for Small Language Models with a Discrete Key-Value Bottleneck
Andor Diera, Lukas Galke, Fabian Karl, Ansgar Scherp

TL;DR
This paper presents a discrete key-value bottleneck (DKVB) approach for encoder-only language models that enables efficient continual learning by reducing catastrophic forgetting and lowering computational costs.
Contribution
The paper introduces a novel DKVB architecture for NLP, including a task-independent initialization technique, and demonstrates its effectiveness across multiple continual learning scenarios.
Findings
DKVB alleviates catastrophic forgetting in NLP models.
DKVB achieves competitive performance with lower computational costs.
DKVB remains effective without task IDs in challenging scenarios.
Abstract
Continual learning remains a challenge across various natural language processing (NLP) tasks, as models updated with new training data often risk catastrophic forgetting of previously acquired knowledge. We introduce a discrete key-value bottleneck (DKVB) for encoder-only language models, enabling efficient continual learning through localized updates. Inspired by a discrete key-value bottleneck in vision, we consider new and NLP-specific challenges. We compare different bottleneck architectures for NLP and introduce a new, task-independent initialization technique for the discrete keys. We evaluate our DKVB for NLP in four continual learning scenarios and show that it alleviates catastrophic forgetting. Our experiments demonstrate that the proposed approach achieves competitive performance compared to popular continual learning methods while incurring lower computational costs.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Speech Recognition and Synthesis · Natural Language Processing Techniques
