Single-Modal Entropy based Active Learning for Visual Question Answering

Dong-Jin Kim; Jae Won Cho; Jinsoo Choi; Yunjae Jung; In So Kweon

arXiv:2110.10906·cs.CV·November 19, 2021·5 cites

Single-Modal Entropy based Active Learning for Visual Question Answering

Dong-Jin Kim, Jae Won Cho, Jinsoo Choi, Yunjae Jung, In So Kweon

PDF

Open Access

TL;DR

This paper introduces a novel active learning method for Visual Question Answering that leverages single-modal branches and mutual information to efficiently select informative samples, reducing labeling costs.

Contribution

It proposes a new single-modal entropy-based sample acquisition strategy, SMEM, combined with self-distillation for multi-modal VQA active learning, improving efficiency and performance.

Findings

01

Achieves state-of-the-art results on VQA datasets

02

Demonstrates cost-effective sample selection

03

Outperforms existing active learning baselines

Abstract

Constructing a large-scale labeled dataset in the real world, especially for high-level tasks (eg, Visual Question Answering), can be expensive and time-consuming. In addition, with the ever-growing amounts of data and architecture complexity, Active Learning has become an important aspect of computer vision research. In this work, we address Active Learning in the multi-modal setting of Visual Question Answering (VQA). In light of the multi-modal inputs, image and question, we propose a novel method for effective sample acquisition through the use of ad hoc single-modal branches for each input to leverage its information. Our mutual information based sample acquisition strategy Single-Modal Entropic Measure (SMEM) in addition to our self-distillation technique enables the sample acquisitor to exploit all present modalities and find the most informative samples. Our novel idea is simple…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques

MethodsHigh-Order Consensuses