Single-Modal Entropy based Active Learning for Visual Question Answering
Dong-Jin Kim, Jae Won Cho, Jinsoo Choi, Yunjae Jung, In So Kweon

TL;DR
This paper introduces a novel active learning method for Visual Question Answering that leverages single-modal branches and mutual information to efficiently select informative samples, reducing labeling costs.
Contribution
It proposes a new single-modal entropy-based sample acquisition strategy, SMEM, combined with self-distillation for multi-modal VQA active learning, improving efficiency and performance.
Findings
Achieves state-of-the-art results on VQA datasets
Demonstrates cost-effective sample selection
Outperforms existing active learning baselines
Abstract
Constructing a large-scale labeled dataset in the real world, especially for high-level tasks (eg, Visual Question Answering), can be expensive and time-consuming. In addition, with the ever-growing amounts of data and architecture complexity, Active Learning has become an important aspect of computer vision research. In this work, we address Active Learning in the multi-modal setting of Visual Question Answering (VQA). In light of the multi-modal inputs, image and question, we propose a novel method for effective sample acquisition through the use of ad hoc single-modal branches for each input to leverage its information. Our mutual information based sample acquisition strategy Single-Modal Entropic Measure (SMEM) in addition to our self-distillation technique enables the sample acquisitor to exploit all present modalities and find the most informative samples. Our novel idea is simple…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
MethodsHigh-Order Consensuses
