K-VQG: Knowledge-aware Visual Question Generation for Common-sense Acquisition
Kohei Uehara, Tatsuya Harada

TL;DR
This paper introduces K-VQG, a novel dataset and model for knowledge-aware visual question generation, aiming to enhance common-sense knowledge acquisition from images.
Contribution
It presents the first large, human-annotated dataset linking image questions to structured knowledge and a new model that encodes knowledge for question generation.
Findings
Our model outperforms existing models on the K-VQG dataset.
The dataset effectively ties questions to structured knowledge.
Knowledge encoding improves question relevance and informativeness.
Abstract
Visual Question Generation (VQG) is a task to generate questions from images. When humans ask questions about an image, their goal is often to acquire some new knowledge. However, existing studies on VQG have mainly addressed question generation from answers or question categories, overlooking the objectives of knowledge acquisition. To introduce a knowledge acquisition perspective into VQG, we constructed a novel knowledge-aware VQG dataset called K-VQG. This is the first large, humanly annotated dataset in which questions regarding images are tied to structured knowledge. We also developed a new VQG model that can encode and use knowledge as the target for a question. The experiment results show that our model outperforms existing models on the K-VQG dataset.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
K-VQG: Knowledge-aware Visual Question Generation for Common-sense Acquisition· youtube
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
