Continual VQA for Disaster Response Systems

Aditya Kane; V Manushree; Sahil Khose

arXiv:2209.10320·cs.CV·November 14, 2022

Continual VQA for Disaster Response Systems

Aditya Kane, V Manushree, Sahil Khose

PDF

Open Access 1 Repo

TL;DR

This paper introduces a continual visual question answering system for disaster response that leverages pre-trained CLIP embeddings and experience replay to improve performance and mitigate catastrophic forgetting in real-life scenarios.

Contribution

It presents a novel continual VQA approach using CLIP embeddings and experience replay, surpassing previous methods on the FloodNet dataset.

Findings

01

Supervised training with CLIP embeddings improves VQA accuracy.

02

Continual learning methods reduce catastrophic forgetting.

03

Achieved state-of-the-art results on FloodNet dataset.

Abstract

Visual Question Answering (VQA) is a multi-modal task that involves answering questions from an input image, semantically understanding the contents of the image and answering it in natural language. Using VQA for disaster management is an important line of research due to the scope of problems that are answered by the VQA system. However, the main challenge is the delay caused by the generation of labels in the assessment of the affected areas. To tackle this, we deployed pre-trained CLIP model, which is trained on visual-image pairs. however, we empirically see that the model has poor zero-shot performance. Thus, we instead use pre-trained embeddings of text and image from this model for our supervised training and surpass previous state-of-the-art results on the FloodNet dataset. We expand this to a continual setting, which is a more real-life scenario. We tackle the problem of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

adityakane2001/continual_vqa
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning

MethodsContrastive Language-Image Pre-training · Experience Replay