Enhancing Continual Learning in Visual Question Answering with   Modality-Aware Feature Distillation

Malvina Nikandrou; Georgios Pantazopoulos; Ioannis Konstas; Alessandro; Suglia

arXiv:2406.19297·cs.CV·June 28, 2024

Enhancing Continual Learning in Visual Question Answering with Modality-Aware Feature Distillation

Malvina Nikandrou, Georgios Pantazopoulos, Ioannis Konstas, Alessandro, Suglia

PDF

Open Access 1 Repo

TL;DR

This paper introduces a modality-aware feature distillation method to improve continual learning in visual question answering by addressing the different learning dynamics of each modality, leading to better retention across tasks.

Contribution

It proposes a novel modality-aware feature distillation technique that enhances continual learning in multimodal VQA models, considering modality-specific evolution.

Findings

01

MAFED outperforms existing baselines across various models and settings.

02

Modality-aware distillation complements experience replay in continual learning.

03

Addressing modality-specific dynamics reduces forgetting in multimodal models.

Abstract

Continual learning focuses on incrementally training a model on a sequence of tasks with the aim of learning new tasks while minimizing performance drop on previous tasks. Existing approaches at the intersection of Continual Learning and Visual Question Answering (VQA) do not study how the multimodal nature of the input affects the learning dynamics of a model. In this paper, we demonstrate that each modality evolves at different rates across a continuum of tasks and that this behavior occurs in established encoder-only models as well as modern recipes for developing Vision & Language (VL) models. Motivated by this observation, we propose a modality-aware feature distillation (MAFED) approach which outperforms existing baselines across models of varying scale in three multimodal continual learning settings. Furthermore, we provide ablations showcasing that modality-aware distillation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

MalvinaNikandrou/mafed
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning