High-Order Attention Models for Visual Question Answering

Idan Schwartz; Alexander G. Schwing; Tamir Hazan

arXiv:1711.04323·cs.CV·November 15, 2017·45 cites

High-Order Attention Models for Visual Question Answering

Idan Schwartz, Alexander G. Schwing, Tamir Hazan

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel high-order attention mechanism that captures complex correlations between visual and textual data, significantly improving performance on visual question answering tasks.

Contribution

It proposes a new high-order attention model that effectively learns complex cross-modal correlations for VQA, advancing the state-of-the-art.

Findings

01

Achieved state-of-the-art results on the VQA dataset.

02

Demonstrated the effectiveness of high-order correlations in attention mechanisms.

03

Improved accuracy over existing models in VQA.

Abstract

The quest for algorithms that enable cognitive abilities is an important part of machine learning. A common trait in many recently investigated cognitive-like tasks is that they take into account different data modalities, such as visual and textual input. In this paper we propose a novel and generally applicable form of attention mechanism that learns high-order correlations between various data modalities. We show that high-order correlations effectively direct the appropriate attention to the relevant elements in the different data modalities that are required to solve the joint task. We demonstrate the effectiveness of our high-order attention mechanism on the task of visual question answering (VQA), where we achieve state-of-the-art performance on the standard VQA dataset.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

idansc/HighOrderAtten
torchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning

MethodsFactor Graph Attention