REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual   Question Answering

Yuanze Lin; Yujia Xie; Dongdong Chen; Yichong Xu; Chenguang Zhu; Lu; Yuan

arXiv:2206.01201·cs.CV·October 11, 2022·44 cites

REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering

Yuanze Lin, Yujia Xie, Dongdong Chen, Yichong Xu, Chenguang Zhu, Lu, Yuan

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces REVIVE, a novel approach that leverages explicit regional visual information to significantly enhance knowledge-based VQA performance, emphasizing the importance of object regions and relationships.

Contribution

REVIVE is the first method to explicitly utilize object region information throughout the knowledge retrieval and answering stages in knowledge-based VQA.

Findings

01

Achieved 58.0% accuracy on OK-VQA, setting a new state-of-the-art.

02

Demonstrated the importance of regional information in different framework components.

03

Showed that better regional visual features lead to substantial performance improvements.

Abstract

This paper revisits visual representation in knowledge-based visual question answering (VQA) and demonstrates that using regional information in a better way can significantly improve the performance. While visual representation is extensively studied in traditional VQA, it is under-explored in knowledge-based VQA even though these two tasks share the common spirit, i.e., rely on visual input to answer the question. Specifically, we observe that in most state-of-the-art knowledge-based VQA methods: 1) visual features are extracted either from the whole image or in a sliding window manner for retrieving knowledge, and the important relationship within/among object regions is neglected; 2) visual features are not well utilized in the final answering model, which is counter-intuitive to some extent. Based on these observations, we propose a new knowledge-based VQA method REVIVE, which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yzleroy/revive
pytorchOfficial

Videos

REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Human Pose and Action Recognition