Dynamic Transformer Architecture for Continual Learning of Multimodal   Tasks

Yuliang Cai; Mohammad Rostami

arXiv:2401.15275·cs.CV·January 30, 2024·1 cites

Dynamic Transformer Architecture for Continual Learning of Multimodal Tasks

Yuliang Cai, Mohammad Rostami

PDF

Open Access

TL;DR

This paper introduces TAM-CL, a transformer-based continual learning framework for multimodal vision-and-language tasks, enabling dynamic model expansion and knowledge transfer to mitigate forgetting and achieve state-of-the-art results.

Contribution

It proposes a scalable, dynamic transformer architecture with task-specific parameters and knowledge distillation for effective continual learning in multimodal settings.

Findings

01

Achieves state-of-the-art performance on multimodal tasks

02

Effectively mitigates catastrophic forgetting

03

Scales with minimal memory and time overhead

Abstract

Transformer neural networks are increasingly replacing prior architectures in a wide range of applications in different data modalities. The increasing size and computational demands of fine-tuning large pre-trained transformer neural networks pose significant challenges for the widespread adoption of these models for applications that demand on-edge computing. To tackle this challenge, continual learning (CL) emerges as a solution by facilitating the transfer of knowledge across tasks that arrive sequentially for an autonomously learning agent. However, current CL methods mainly focus on learning tasks that are exclusively vision-based or language-based. We propose a transformer-based CL framework focusing on learning tasks that involve both vision and language, known as Vision-and-Language (VaL) tasks. Due to the success of transformers in other modalities, our architecture has the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning · Speech and dialogue systems

MethodsFocus · Knowledge Distillation · Balanced Selection