2nd Place Solution to the GQA Challenge 2019

Shijie Geng; Ji Zhang; Hang Zhang; Ahmed Elgammal and; Dimitris N. Metaxas

arXiv:1907.06794·cs.CV·August 20, 2019·6 cites

2nd Place Solution to the GQA Challenge 2019

Shijie Geng, Ji Zhang, Hang Zhang, Ahmed Elgammal and, Dimitris N. Metaxas

PDF

Open Access

TL;DR

This paper introduces a simple statistical feature-based method that significantly improves visual question answering performance, demonstrating the importance of feature extraction over reasoning in complex visual reasoning tasks.

Contribution

The paper presents a novel approach using statistical features from question words to enhance reasoning in visual question answering, highlighting the bottleneck in feature extraction.

Findings

01

Statistical features outperform detected features in reasoning tasks.

02

Using ground-truth features yields the best performance.

03

The method achieved 2nd place in the GQA Challenge 2019.

Abstract

We present a simple method that achieves unexpectedly superior performance for Complex Reasoning involved Visual Question Answering. Our solution collects statistical features from high-frequency words of all the questions asked about an image and use them as accurate knowledge for answering further questions of the same image. We are fully aware that this setting is not ubiquitously applicable, and in a more common setting one should assume the questions are asked separately and they cannot be gathered to obtain a knowledge base. Nonetheless, we use this method as an evidence to demonstrate our observation that the bottleneck effect is more severe on the feature extraction part than it is on the knowledge reasoning part. We show significant gaps when using the same reasoning model with 1) ground-truth features; 2) statistical features; 3) detected features from completely learned…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques