Improving and Diagnosing Knowledge-Based Visual Question Answering via   Entity Enhanced Knowledge Injection

Diego Garcia-Olano; Yasumasa Onoe; Joydeep Ghosh

arXiv:2112.06888·cs.CL·May 30, 2022·1 cites

Improving and Diagnosing Knowledge-Based Visual Question Answering via Entity Enhanced Knowledge Injection

Diego Garcia-Olano, Yasumasa Onoe, Joydeep Ghosh

PDF

Open Access

TL;DR

This paper explores how injecting entity-enhanced knowledge graph embeddings into a bi-modal VQA system improves performance on KBVQA tasks, especially with rare entities, without extra pre-training.

Contribution

It demonstrates effective knowledge injection methods in a bi-modal setting, analyzing their impact on VQA performance and interpretability, using publicly available datasets.

Findings

01

Significant performance gains on KBVQA datasets

02

Knowledge injection benefits rare entity questions

03

Improved explanations with entity-enhanced representations

Abstract

Knowledge-Based Visual Question Answering (KBVQA) is a bi-modal task requiring external world knowledge in order to correctly answer a text question and associated image. Recent single modality text work has shown knowledge injection into pre-trained language models, specifically entity enhanced knowledge graph embeddings, can improve performance on downstream entity-centric tasks. In this work, we empirically study how and whether such methods, applied in a bi-modal setting, can improve an existing VQA system's performance on the KBVQA task. We experiment with two large publicly available VQA datasets, (1) KVQA which contains mostly rare Wikipedia entities and (2) OKVQA which is less entity-centric and more aligned with common sense reasoning. Both lack explicit entity spans and we study the effect of different weakly supervised and manual methods for obtaining them. Additionally we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning