Learning to Count Objects in Natural Images for Visual Question   Answering

Yan Zhang; Jonathon Hare; Adam Pr\"ugel-Bennett

arXiv:1802.05766·cs.CV·February 19, 2018·155 cites

Learning to Count Objects in Natural Images for Visual Question Answering

Yan Zhang, Jonathon Hare, Adam Pr\"ugel-Bennett

PDF

Open Access 1 Repo

TL;DR

This paper introduces a neural network component that improves counting objects in natural images for VQA, achieving state-of-the-art results without compromising other question categories.

Contribution

It presents a novel counting module that addresses soft attention issues, significantly enhancing counting accuracy in VQA systems.

Findings

01

State-of-the-art accuracy on VQA v2 counting category

02

6.6% improvement on balanced pair metric

03

Effective in a toy task and real datasets

Abstract

Visual Question Answering (VQA) models have struggled with counting objects in natural images so far. We identify a fundamental problem due to soft attention in these models as a cause. To circumvent this problem, we propose a neural network component that allows robust counting from object proposals. Experiments on a toy task show the effectiveness of this component and we obtain state-of-the-art accuracy on the number category of the VQA v2 dataset without negatively affecting other categories, even outperforming ensemble models with our single model. On a difficult balanced pair metric, the component gives a substantial improvement in counting over a strong baseline by 6.6%.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Cyanogenoid/vqa-counting
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning