Guiding Visual Question Generation

Nihir Vedd; Zixu Wang; Marek Rei; Yishu Miao; Lucia Specia

arXiv:2110.08226·cs.LG·July 27, 2022·1 cites

Guiding Visual Question Generation

Nihir Vedd, Zixu Wang, Marek Rei, Yishu Miao, Lucia Specia

PDF

Open Access

TL;DR

This paper introduces guided models for visual question generation that condition questions on specific objects and categories, improving relevance and coherence over existing methods.

Contribution

It proposes explicit and implicit guidance mechanisms for VQG, enabling better control and diversity in generated questions.

Findings

01

Over 9 BLEU-4 improvement over state-of-the-art

02

Guidance improves grammatical coherence and relevance

03

Models effectively condition on specified objects and categories

Abstract

In traditional Visual Question Generation (VQG), most images have multiple concepts (e.g. objects and categories) for which a question could be generated, but models are trained to mimic an arbitrary choice of concept as given in their training data. This makes training difficult and also poses issues for evaluation -- multiple valid questions exist for most images but only one or a few are captured by the human references. We present Guiding Visual Question Generation - a variant of VQG which conditions the question generator on categorical information based on expectations on the type of question and the objects it should explore. We propose two variants: (i) an explicitly guided model that enables an actor (human or automated) to select which objects and categories to generate a question for; and (ii) an implicitly guided model that learns which objects and categories to condition…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Advanced Image and Video Retrieval Techniques