K-VQG: Knowledge-aware Visual Question Generation for Common-sense   Acquisition

Kohei Uehara; Tatsuya Harada

arXiv:2203.07890·cs.CV·March 16, 2022

K-VQG: Knowledge-aware Visual Question Generation for Common-sense Acquisition

Kohei Uehara, Tatsuya Harada

PDF

Open Access 1 Video

TL;DR

This paper introduces K-VQG, a novel dataset and model for knowledge-aware visual question generation, aiming to enhance common-sense knowledge acquisition from images.

Contribution

It presents the first large, human-annotated dataset linking image questions to structured knowledge and a new model that encodes knowledge for question generation.

Findings

01

Our model outperforms existing models on the K-VQG dataset.

02

The dataset effectively ties questions to structured knowledge.

03

Knowledge encoding improves question relevance and informativeness.

Abstract

Visual Question Generation (VQG) is a task to generate questions from images. When humans ask questions about an image, their goal is often to acquire some new knowledge. However, existing studies on VQG have mainly addressed question generation from answers or question categories, overlooking the objectives of knowledge acquisition. To introduce a knowledge acquisition perspective into VQG, we constructed a novel knowledge-aware VQG dataset called K-VQG. This is the first large, humanly annotated dataset in which questions regarding images are tied to structured knowledge. We also developed a new VQG model that can encode and use knowledge as the target for a question. The experiment results show that our model outperforms existing models on the K-VQG dataset.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

K-VQG: Knowledge-aware Visual Question Generation for Common-sense Acquisition· youtube

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning