Proposing Plausible Answers for Open-ended Visual Question Answering
Omid Bakhshandeh, Trung Bui, Zhe Lin, Walter Chang

TL;DR
This paper introduces the novel task of Answer Proposal in VQA, where systems generate ranked plausible answers based on question semantics, improving answer relevance and system understanding.
Contribution
It proposes the new task of Answer Proposal for VQA, exploring models like neural generators and semantic graph matching, with comprehensive evaluations.
Findings
Best model achieves high recall in proposing plausible answers
Models perform competitively with existing VQA solutions
Semantic understanding improves answer proposal quality
Abstract
Answering open-ended questions is an essential capability for any intelligent agent. One of the most interesting recent open-ended question answering challenges is Visual Question Answering (VQA) which attempts to evaluate a system's visual understanding through its answers to natural language questions about images. There exist many approaches to VQA, the majority of which do not exhibit deeper semantic understanding of the candidate answers they produce. We study the importance of generating plausible answers to a given question by introducing the novel task of `Answer Proposal': for a given open-ended question, a system should generate a ranked list of candidate answers informed by the semantics of the question. We experiment with various models including a neural generative model as well as a semantic graph matching one. We provide both intrinsic and extrinsic evaluations for the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Advanced Image and Video Retrieval Techniques
