One VLM to Keep it Learning: Generation and Balancing for Data-free   Continual Visual Question Answering

Deepayan Das; Davide Talon; Massimiliano Mancini; Yiming Wang; Elisa; Ricci

arXiv:2411.02210·cs.CV·March 19, 2025

One VLM to Keep it Learning: Generation and Balancing for Data-free Continual Visual Question Answering

Deepayan Das, Davide Talon, Massimiliano Mancini, Yiming Wang, Elisa, Ricci

PDF

Open Access

TL;DR

This paper introduces GaB, a novel data-free continual learning approach for Visual Question Answering that generates pseudo-rehearsal data using a vision-language model, effectively mitigating forgetting without storing past data.

Contribution

GaB is the first data-free method leveraging VLMs for pseudo-rehearsal data generation in continual VQA, with a balancing module to improve data distribution alignment.

Findings

01

GaB outperforms all data-free baselines in continual VQA tasks.

02

GaB matches the performance of methods with access to past data.

03

The balancing module effectively aligns generated data with ground-truth distribution.

Abstract

Vision-Language Models (VLMs) have shown significant promise in Visual Question Answering (VQA) tasks by leveraging web-scale multimodal datasets. However, these models often struggle with continual learning due to catastrophic forgetting when adapting to new tasks. As an effective remedy to mitigate catastrophic forgetting, rehearsal strategy uses the data of past tasks upon learning new task. However, such strategy incurs the need of storing past data, which might not be feasible due to hardware constraints or privacy concerns. In this work, we propose the first data-free method that leverages the language generation capability of a VLM, instead of relying on external models, to produce pseudo-rehearsal data for addressing continual VQA. Our proposal, named as GaB, generates pseudo-rehearsal data by posing previous task questions on new task data. Yet, despite being effective, the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Geographic Information Systems Studies