Guiding Visual Question Generation
Nihir Vedd, Zixu Wang, Marek Rei, Yishu Miao, Lucia Specia

TL;DR
This paper introduces guided models for visual question generation that condition questions on specific objects and categories, improving relevance and coherence over existing methods.
Contribution
It proposes explicit and implicit guidance mechanisms for VQG, enabling better control and diversity in generated questions.
Findings
Over 9 BLEU-4 improvement over state-of-the-art
Guidance improves grammatical coherence and relevance
Models effectively condition on specified objects and categories
Abstract
In traditional Visual Question Generation (VQG), most images have multiple concepts (e.g. objects and categories) for which a question could be generated, but models are trained to mimic an arbitrary choice of concept as given in their training data. This makes training difficult and also poses issues for evaluation -- multiple valid questions exist for most images but only one or a few are captured by the human references. We present Guiding Visual Question Generation - a variant of VQG which conditions the question generator on categorical information based on expectations on the type of question and the objects it should explore. We propose two variants: (i) an explicitly guided model that enables an actor (human or automated) to select which objects and categories to generate a question for; and (ii) an implicitly guided model that learns which objects and categories to condition…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Advanced Image and Video Retrieval Techniques
