Benchmark Visual Question Answer Models by using Focus Map

Wenda Qiu; Yueyang Xianzang; Zhekai Zhang

arXiv:1801.05302·cs.CV·January 17, 2018·2 cites

Benchmark Visual Question Answer Models by using Focus Map

Wenda Qiu, Yueyang Xianzang, Zhekai Zhang

PDF

Open Access

TL;DR

This paper introduces a method to evaluate focus maps in visual reasoning models, demonstrating that certain models learn to focus on relevant objects more effectively than end-to-end models.

Contribution

It proposes a novel evaluation approach for focus maps in visual reasoning models and applies it to compare different models on the CLEVR dataset.

Findings

01

CLEVR-iep model learns to focus on relevant objects more than end-to-end models

02

The evaluation method can be applied to any model with inferable focus maps

03

Focus maps correlate with model performance on visual reasoning tasks

Abstract

Inferring and Executing Programs for Visual Reasoning proposes a model for visual reasoning that consists of a program generator and an execution engine to avoid end-to-end models. To show that the model actually learns which objects to focus on to answer the questions, the authors give a visualization of the norm of the gradient of the sum of the predicted answer scores with respect to the final feature map. However, the authors do not evaluate the efficiency of focus map. This paper purposed a method for evaluating it. We generate several kinds of questions to test different keywords. We infer focus maps from the model by asking these questions and evaluate them by comparing with the segmentation graph. Furthermore, this method can be applied to any model if focus maps can be inferred from it. By evaluating focus map of different models on the CLEVR dataset, we will show that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning