Found a Reason for me? Weakly-supervised Grounded Visual Question   Answering using Capsules

Aisha Urooj Khan; Hilde Kuehne; Kevin Duarte; Chuang Gan; Niels Lobo,; Mubarak Shah

arXiv:2105.04836·cs.CV·May 12, 2021

Found a Reason for me? Weakly-supervised Grounded Visual Question Answering using Capsules

Aisha Urooj Khan, Hilde Kuehne, Kevin Duarte, Chuang Gan, Niels Lobo,, Mubarak Shah

PDF

1 Repo

TL;DR

This paper introduces a weakly-supervised grounding method for visual question answering using a capsule module that improves object localization based on question cues, without relying on bounding box annotations.

Contribution

The authors propose a novel capsule-based module with query-based selection for weakly-supervised grounding in VQA, enhancing existing systems' ability to localize relevant objects.

Findings

01

Improved grounding accuracy on CLEVR-Answers and GQA datasets.

02

Comparable VQA performance with enhanced grounding capabilities.

03

Effective integration of capsule module into existing VQA architectures.

Abstract

The problem of grounding VQA tasks has seen an increased attention in the research community recently, with most attempts usually focusing on solving this task by using pretrained object detectors. However, pre-trained object detectors require bounding box annotations for detecting relevant objects in the vocabulary, which may not always be feasible for real-life large-scale applications. In this paper, we focus on a more relaxed setting: the grounding of relevant visual entities in a weakly supervised manner by training on the VQA task alone. To address this problem, we propose a visual capsule module with a query-based selection mechanism of capsule features, that allows the model to focus on relevant regions based on the textual cues about visual information in the question. We show that integrating the proposed capsule module in existing VQA systems significantly improves their…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aurooj/WeakGroundedVQA_Capsules
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.