DyTox: Transformers for Continual Learning with DYnamic TOken eXpansion
Arthur Douillard, Alexandre Ram\'e, Guillaume Couairon, Matthieu Cord

TL;DR
DyTox introduces a scalable transformer-based continual learning model that dynamically expands tokens to learn new tasks efficiently without forgetting, requiring minimal hyperparameter tuning and outperforming existing methods on large datasets.
Contribution
The paper presents a novel transformer architecture with shared encoder/decoder and dynamic token expansion, enabling scalable continual learning with minimal overhead and no hyperparameter tuning.
Findings
Achieves state-of-the-art results on ImageNet datasets.
Requires less memory and fewer parameters than existing dynamic methods.
Effectively scales to a large number of tasks without significant overhead.
Abstract
Deep network architectures struggle to continually learn new tasks without forgetting the previous tasks. A recent trend indicates that dynamic architectures based on an expansion of the parameters can reduce catastrophic forgetting efficiently in continual learning. However, existing approaches often require a task identifier at test-time, need complex tuning to balance the growing number of parameters, and barely share any information across tasks. As a result, they struggle to scale to a large number of tasks without significant overhead. In this paper, we propose a transformer architecture based on a dedicated encoder/decoder framework. Critically, the encoder and decoder are shared among all tasks. Through a dynamic expansion of special tokens, we specialize each forward of our decoder network on a task distribution. Our strategy scales to a large number of tasks while having…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Face recognition and analysis
