DyTox: Transformers for Continual Learning with DYnamic TOken eXpansion

Arthur Douillard; Alexandre Ram\'e; Guillaume Couairon; Matthieu Cord

arXiv:2111.11326·cs.CV·August 9, 2022·1 cites

DyTox: Transformers for Continual Learning with DYnamic TOken eXpansion

Arthur Douillard, Alexandre Ram\'e, Guillaume Couairon, Matthieu Cord

PDF

Open Access 1 Repo

TL;DR

DyTox introduces a scalable transformer-based continual learning model that dynamically expands tokens to learn new tasks efficiently without forgetting, requiring minimal hyperparameter tuning and outperforming existing methods on large datasets.

Contribution

The paper presents a novel transformer architecture with shared encoder/decoder and dynamic token expansion, enabling scalable continual learning with minimal overhead and no hyperparameter tuning.

Findings

01

Achieves state-of-the-art results on ImageNet datasets.

02

Requires less memory and fewer parameters than existing dynamic methods.

03

Effectively scales to a large number of tasks without significant overhead.

Abstract

Deep network architectures struggle to continually learn new tasks without forgetting the previous tasks. A recent trend indicates that dynamic architectures based on an expansion of the parameters can reduce catastrophic forgetting efficiently in continual learning. However, existing approaches often require a task identifier at test-time, need complex tuning to balance the growing number of parameters, and barely share any information across tasks. As a result, they struggle to scale to a large number of tasks without significant overhead. In this paper, we propose a transformer architecture based on a dedicated encoder/decoder framework. Critically, the encoder and decoder are shared among all tasks. Through a dynamic expansion of special tokens, we specialize each forward of our decoder network on a task distribution. Our strategy scales to a large number of tasks while having…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

arthurdouillard/dytox
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Face recognition and analysis