Entity-Focused Dense Passage Retrieval for Outside-Knowledge Visual   Question Answering

Jialin Wu; Raymond J. Mooney

arXiv:2210.10176·cs.CL·October 24, 2022

Entity-Focused Dense Passage Retrieval for Outside-Knowledge Visual Question Answering

Jialin Wu, Raymond J. Mooney

PDF

Open Access

TL;DR

This paper introduces EnFoRe, an entity-focused retrieval model that enhances knowledge retrieval for outside-knowledge visual question answering, leading to improved accuracy and state-of-the-art results.

Contribution

The paper proposes a novel entity-focused retrieval approach with stronger supervision, improving knowledge relevance in OK-VQA systems.

Findings

01

EnFoRe achieves superior retrieval performance on OK-VQA.

02

Combining EnFoRe with VQA models yields new state-of-the-art results.

03

Entity recognition improves knowledge specificity and answer accuracy.

Abstract

Most Outside-Knowledge Visual Question Answering (OK-VQA) systems employ a two-stage framework that first retrieves external knowledge given the visual question and then predicts the answer based on the retrieved content. However, the retrieved knowledge is often inadequate. Retrievals are frequently too general and fail to cover specific knowledge needed to answer the question. Also, the naturally available supervision (whether the passage contains the correct answer) is weak and does not guarantee question relevancy. To address these issues, we propose an Entity-Focused Retrieval (EnFoRe) model that provides stronger supervision during training and recognizes question-relevant entities to help retrieve more specific knowledge. Experiments show that our EnFoRe model achieves superior retrieval performance on OK-VQA, the currently largest outside-knowledge VQA dataset. We also combine…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning