Dynamic Transformer Architecture for Continual Learning of Multimodal Tasks
Yuliang Cai, Mohammad Rostami

TL;DR
This paper introduces TAM-CL, a transformer-based continual learning framework for multimodal vision-and-language tasks, enabling dynamic model expansion and knowledge transfer to mitigate forgetting and achieve state-of-the-art results.
Contribution
It proposes a scalable, dynamic transformer architecture with task-specific parameters and knowledge distillation for effective continual learning in multimodal settings.
Findings
Achieves state-of-the-art performance on multimodal tasks
Effectively mitigates catastrophic forgetting
Scales with minimal memory and time overhead
Abstract
Transformer neural networks are increasingly replacing prior architectures in a wide range of applications in different data modalities. The increasing size and computational demands of fine-tuning large pre-trained transformer neural networks pose significant challenges for the widespread adoption of these models for applications that demand on-edge computing. To tackle this challenge, continual learning (CL) emerges as a solution by facilitating the transfer of knowledge across tasks that arrive sequentially for an autonomously learning agent. However, current CL methods mainly focus on learning tasks that are exclusively vision-based or language-based. We propose a transformer-based CL framework focusing on learning tasks that involve both vision and language, known as Vision-and-Language (VaL) tasks. Due to the success of transformers in other modalities, our architecture has the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning · Speech and dialogue systems
MethodsFocus · Knowledge Distillation · Balanced Selection
