Proposing Plausible Answers for Open-ended Visual Question Answering

Omid Bakhshandeh; Trung Bui; Zhe Lin; Walter Chang

arXiv:1610.06620·cs.CL·October 25, 2016

Proposing Plausible Answers for Open-ended Visual Question Answering

Omid Bakhshandeh, Trung Bui, Zhe Lin, Walter Chang

PDF

Open Access

TL;DR

This paper introduces the novel task of Answer Proposal in VQA, where systems generate ranked plausible answers based on question semantics, improving answer relevance and system understanding.

Contribution

It proposes the new task of Answer Proposal for VQA, exploring models like neural generators and semantic graph matching, with comprehensive evaluations.

Findings

01

Best model achieves high recall in proposing plausible answers

02

Models perform competitively with existing VQA solutions

03

Semantic understanding improves answer proposal quality

Abstract

Answering open-ended questions is an essential capability for any intelligent agent. One of the most interesting recent open-ended question answering challenges is Visual Question Answering (VQA) which attempts to evaluate a system's visual understanding through its answers to natural language questions about images. There exist many approaches to VQA, the majority of which do not exhibit deeper semantic understanding of the candidate answers they produce. We study the importance of generating plausible answers to a given question by introducing the novel task of `Answer Proposal': for a given open-ended question, a system should generate a ranked list of candidate answers informed by the semantics of the question. We experiment with various models including a neural generative model as well as a semantic graph matching one. We provide both intrinsic and extrinsic evaluations for the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Advanced Image and Video Retrieval Techniques