Can Open Domain Question Answering Systems Answer Visual Knowledge Questions?
Jiawen Zhang, Abhijit Mishra, Avinesh P.V.S, Siddharth Patwardhan and, Sachin Agarwal

TL;DR
This paper presents a data-efficient method for visual question answering by rewriting visual questions into text-based questions and leveraging existing open domain QA systems, achieving competitive results with limited training data.
Contribution
It introduces a novel approach that reuses existing text-based QA systems for visual questions through entity-based question rewriting, reducing data requirements.
Findings
Achieves competitive performance on OKVQA dataset.
Uses only 10% of training data for comparable results.
Proposes unsupervised and weakly supervised rewriting strategies.
Abstract
The task of Outside Knowledge Visual Question Answering (OKVQA) requires an automatic system to answer natural language questions about pictures and images using external knowledge. We observe that many visual questions, which contain deictic referential phrases referring to entities in the image, can be rewritten as "non-grounded" questions and can be answered by existing text-based question answering systems. This allows for the reuse of existing text-based Open Domain Question Answering (QA) Systems for visual question answering. In this work, we propose a potentially data-efficient approach that reuses existing systems for (a) image analysis, (b) question rewriting, and (c) text-based question answering to answer such visual questions. Given an image and a question pertaining to that image (a visual question), we first extract the entities present in the image using pre-trained…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dense Connections · Adam · Attention Dropout · Linear Warmup With Linear Decay · Layer Normalization · WordPiece · Residual Connection
