Transfer Learning via Unsupervised Task Discovery for Visual Question Answering
Hyeonwoo Noh, Taehoon Kim, Jonghwan Mun, Bohyung Han

TL;DR
This paper introduces a method to improve visual question answering by learning task-specific visual classifiers through unsupervised discovery, enabling better handling of out-of-vocabulary answers using visual and linguistic data.
Contribution
It proposes a novel approach to transfer unsupervised learned visual classifiers to VQA models, bridging visual data and question-dependent answering.
Findings
Successfully generalizes to out-of-vocabulary answers
Leverages structured lexical databases and visual descriptions
Enhances VQA performance with transferred visual classifiers
Abstract
We study how to leverage off-the-shelf visual and linguistic data to cope with out-of-vocabulary answers in visual question answering task. Existing large-scale visual datasets with annotations such as image class labels, bounding boxes and region descriptions are good sources for learning rich and diverse visual concepts. However, it is not straightforward how the visual concepts can be captured and transferred to visual question answering models due to missing link between question dependent answering models and visual data without question. We tackle this problem in two steps: 1) learning a task conditional visual classifier, which is capable of solving diverse question-specific visual recognition tasks, based on unsupervised task discovery and 2) transferring the task conditional visual classifier to visual question answering models. Specifically, we employ linguistic knowledge…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
