An Improved Attention for Visual Question Answering

Tanzila Rahman; Shih-Han Chou; Leonid Sigal; Giuseppe Carenini

arXiv:2011.02164·cs.CV·June 7, 2021

An Improved Attention for Visual Question Answering

Tanzila Rahman, Shih-Han Chou, Leonid Sigal, Giuseppe Carenini

PDF

1 Repo

TL;DR

This paper introduces an enhanced attention mechanism with an Attention on Attention module within an encoder-decoder framework for Visual Question Answering, significantly improving accuracy on the VQA-v2 benchmark.

Contribution

It proposes a novel Attention on Attention (AoA) module and a multimodal fusion approach, advancing the state-of-the-art in VQA performance.

Findings

01

Achieves state-of-the-art results on VQA-v2 dataset

02

Demonstrates the effectiveness of AoA in capturing complex dependencies

03

Improves multimodal information integration

Abstract

We consider the problem of Visual Question Answering (VQA). Given an image and a free-form, open-ended, question, expressed in natural language, the goal of VQA system is to provide accurate answer to this question with respect to the image. The task is challenging because it requires simultaneous and intricate understanding of both visual and textual information. Attention, which captures intra- and inter-modal dependencies, has emerged as perhaps the most widely used mechanism for addressing these challenges. In this paper, we propose an improved attention-based architecture to solve VQA. We incorporate an Attention on Attention (AoA) module within encoder-decoder framework, which is able to determine the relation between attention results and queries. Attention module generates weighted average for each query. On the other hand, AoA module first generates an information vector and an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lucidrains/AoA-pytorch
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Methodsfast speak--How do I Speak to someone at Expedia?