Differential Attention for Visual Question Answering

Badri Patro; Vinay P. Namboodiri

arXiv:1804.00298·cs.CV·April 4, 2018

Differential Attention for Visual Question Answering

Badri Patro, Vinay P. Namboodiri

PDF

1 Repo

TL;DR

This paper introduces a differential attention mechanism for visual question answering that leverages exemplars to better mimic human attention, leading to improved accuracy on benchmark datasets.

Contribution

It proposes an exemplar-based differential attention method that aligns more closely with human focus, enhancing VQA performance over traditional image-based attention approaches.

Findings

01

Outperforms existing image-based attention methods.

02

Achieves competitive results with state-of-the-art models.

03

Improves question-answering accuracy on benchmark datasets.

Abstract

In this paper we aim to answer questions based on images when provided with a dataset of question-answer pairs for a number of images during training. A number of methods have focused on solving this problem by using image based attention. This is done by focusing on a specific part of the image while answering the question. Humans also do so when solving this problem. However, the regions that the previous systems focus on are not correlated with the regions that humans focus on. The accuracy is limited due to this drawback. In this paper, we propose to solve this problem by using an exemplar based method. We obtain one or more supporting and opposing exemplars to obtain a differential attention region. This differential attention is closer to human attention than other image based attention methods. It also helps in obtaining improved accuracy when answering questions. The method is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chirag26495/DAN_VQA
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.