Yin and Yang: Balancing and Answering Binary Visual Questions

Peng Zhang; Yash Goyal; Douglas Summers-Stay; Dhruv Batra; Devi Parikh

arXiv:1511.05099·cs.CL·April 20, 2016·39 cites

Yin and Yang: Balancing and Answering Binary Visual Questions

Peng Zhang, Yash Goyal, Douglas Summers-Stay, Dhruv Batra, Devi Parikh

PDF

Open Access

TL;DR

This paper proposes a visual verification approach for binary visual question answering using abstract scenes, balancing datasets to reduce language priors and emphasizing high-level semantics to improve understanding.

Contribution

It introduces a concept-based verification method for binary VQA on abstract scenes and demonstrates dataset balancing to control language priors, enhancing model understanding.

Findings

01

The approach matches state-of-the-art on unbalanced datasets.

02

It outperforms existing methods on balanced datasets.

03

Balanced datasets reduce language bias in VQA.

Abstract

The complex compositional structure of language makes problems at the intersection of vision and language challenging. But language also provides a strong prior that can result in good superficial performance, without the underlying models truly understanding the visual content. This can hinder progress in pushing state of art in the computer vision aspects of multi-modal AI. In this paper, we address binary Visual Question Answering (VQA) on abstract scenes. We formulate this problem as visual verification of concepts inquired in the questions. Specifically, we convert the question to a tuple that concisely summarizes the visual concept to be detected in the image. If the concept can be found in the image, the answer to the question is "yes", and otherwise "no". Abstract scenes play two roles (1) They allow us to focus on the high-level semantics of the VQA task as opposed to the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning