Dual Recurrent Attention Units for Visual Question Answering

Ahmed Osman; Wojciech Samek

arXiv:1802.00209·cs.AI·March 27, 2019·6 cites

Dual Recurrent Attention Units for Visual Question Answering

Ahmed Osman, Wojciech Samek

PDF

Open Access 1 Repo

TL;DR

This paper introduces a recurrent attention mechanism for visual question answering, demonstrating its superiority over traditional convolutional attention and achieving state-of-the-art results on multiple VQA datasets.

Contribution

The paper proposes dual Recurrent Attention Units (RAUs) for VQA, showing their effectiveness and improving performance over existing models and attention mechanisms.

Findings

01

Outperforms the first place on VQA 2016 challenge

02

Second best on VQA 1.0 dataset

03

Improves performance of state-of-the-art models

Abstract

Visual Question Answering (VQA) requires AI models to comprehend data in two domains, vision and text. Current state-of-the-art models use learned attention mechanisms to extract relevant information from the input domains to answer a certain question. Thus, robust attention mechanisms are essential for powerful VQA models. In this paper, we propose a recurrent attention mechanism and show its benefits compared to the traditional convolutional approach. We perform two ablation studies to evaluate recurrent attention. First, we introduce a baseline VQA model with visual attention and test the performance difference between convolutional and recurrent attention on the VQA 2.0 dataset. Secondly, we design an architecture for VQA which utilizes dual (textual and visual) Recurrent Attention Units (RAUs). Using this model, we show the effect of all possible combinations of recurrent and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ahmedmagdiosman/compress-vqa
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning