Routing without Forgetting
Alessio Masano, Giovanni Bellitto, Dipam Goswani, Joost Van de Weijer, Concetto Spampinato

TL;DR
The paper introduces Routing without Forgetting (RwF), a novel transformer architecture that uses energy-based associative retrieval layers for dynamic input routing, significantly improving online continual learning performance without task-specific prompts.
Contribution
It presents RwF, a new method that embeds energy-based associative routing within transformers, enabling effective online continual learning without explicit task identifiers or repeated optimization.
Findings
Outperforms existing prompt-based methods on class-incremental benchmarks.
Achieves large margin improvements on Split-ImageNet-R and Split-ImageNet-S.
Effective in few-shot learning regimes.
Abstract
Continual learning in transformers is commonly addressed through parameter-efficient adaptation: prompts, adapters, or LoRA modules are specialized per task while the backbone remains frozen. Although effective in controlled multi-epoch settings, these approaches rely on gradual gradient-based specialization and struggle in Online Continual Learning (OCL), where data arrive as a non-stationary stream and each sample may be observed only once. We recast continual learning in transformers as a routing problem: under strict online constraints, the model must dynamically select the appropriate representational subspace for each input without explicit task identifiers or repeated optimization. We thus introduce Routing without Forgetting (RwF), a transformer architecture augmented with energy-based associative retrieval layers inspired by Modern Hopfield Networks. Instead of storing or…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Face recognition and analysis
