Task-Attentive Transformer Architecture for Continual Learning of   Vision-and-Language Tasks Using Knowledge Distillation

Yuliang Cai; Jesse Thomason; Mohammad Rostami

arXiv:2303.14423·cs.LG·March 28, 2023·1 cites

Task-Attentive Transformer Architecture for Continual Learning of Vision-and-Language Tasks Using Knowledge Distillation

Yuliang Cai, Jesse Thomason, Mohammad Rostami

PDF

Open Access

TL;DR

This paper introduces a transformer-based continual learning architecture for vision-and-language tasks that dynamically increases parameters and uses knowledge distillation to prevent forgetting, achieving state-of-the-art results.

Contribution

It presents a novel scalable transformer architecture for bimodal continual learning that dynamically adds parameters and employs knowledge distillation to improve performance.

Findings

01

Achieves state-of-the-art performance on vision-and-language tasks.

02

Requires minimal memory and computational overhead.

03

Effectively mitigates catastrophic forgetting in continual learning.

Abstract

The size and the computational load of fine-tuning large-scale pre-trained neural network are becoming two major obstacles in adopting machine learning in many applications. Continual learning (CL) can serve as a remedy through enabling knowledge-transfer across sequentially arriving tasks which relaxes the need to fine-tune all network weights from scratch. However, existing CL algorithms primarily consider learning unimodal vision-only or language-only tasks. We develop a transformer-based CL architecture for learning bimodal vision-and-language tasks based on increasing the number of the learnable parameters dynamically and using knowledge distillation. The new additional parameters are used to specialize the network for each task. Our approach enables sharing information between the tasks while addressing the challenge of catastrophic forgetting. Our approach is scalable learning to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · COVID-19 diagnosis using AI