Weakly-Supervised Visual-Retriever-Reader for Knowledge-based Question   Answering

Man Luo; Yankai Zeng; Pratyay Banerjee; Chitta Baral

arXiv:2109.04014·cs.CL·September 10, 2021

Weakly-Supervised Visual-Retriever-Reader for Knowledge-based Question Answering

Man Luo, Yankai Zeng, Pratyay Banerjee, Chitta Baral

PDF

Open Access 2 Repos

TL;DR

This paper introduces a weakly-supervised visual retriever-reader pipeline for knowledge-based VQA, utilizing a newly collected universal knowledge base to improve answer accuracy on the OK-VQA dataset.

Contribution

It proposes a novel weakly-supervised retriever-reader framework and a universal knowledge base for fair comparison and improved performance in knowledge-based VQA.

Findings

01

A strong retriever significantly boosts reader performance.

02

The proposed methods outperform baselines on OK-VQA.

03

The universal knowledge base enables fairer model comparisons.

Abstract

Knowledge-based visual question answering (VQA) requires answering questions with external knowledge in addition to the content of images. One dataset that is mostly used in evaluating knowledge-based VQA is OK-VQA, but it lacks a gold standard knowledge corpus for retrieval. Existing work leverage different knowledge bases (e.g., ConceptNet and Wikipedia) to obtain external knowledge. Because of varying knowledge bases, it is hard to fairly compare models' performance. To address this issue, we collect a natural language knowledge base that can be used for any VQA system. Moreover, we propose a Visual Retriever-Reader pipeline to approach knowledge-based VQA. The visual retriever aims to retrieve relevant knowledge, and the visual reader seeks to predict answers based on given knowledge. We introduce various ways to retrieve knowledge using text and images and two reader styles:…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning