Conditionally Learn to Pay Attention for Sequential Visual Task

Jun He; Quan-Jie Cao; Lei Zhang

arXiv:1911.04365·cs.CV·April 2, 2020

Conditionally Learn to Pay Attention for Sequential Visual Task

Jun He, Quan-Jie Cao, Lei Zhang

PDF

1 Repo

TL;DR

This paper introduces a novel conditional attention framework for sequential visual tasks, leveraging a global feature descriptor to improve focus on relevant objects, outperforming existing soft attention methods on SVHN and image captioning.

Contribution

It proposes a new conditional attention mechanism using a global feature descriptor, adaptable with different recurrent structures for various visual tasks.

Findings

01

Achieves state-of-the-art results on SVHN dataset.

02

Generates better scores than soft attention in image captioning.

03

Effective across multiple visual tasks with different recurrent modules.

Abstract

Sequential visual task usually requires to pay attention to its current interested object conditional on its previous observations. Different from popular soft attention mechanism, we propose a new attention framework by introducing a novel conditional global feature which represents the weak feature descriptor of the current focused object. Specifically, for a standard CNN (Convolutional Neural Network) pipeline, the convolutional layers with different receptive fields are used to produce the attention maps by measuring how the convolutional features align to the conditional global feature. The conditional global feature can be generated by different recurrent structure according to different visual tasks, such as a simple recurrent neural network for multiple objects recognition, or a moderate complex language model for image caption. Experiments show that our proposed conditional…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

caoquanjie/ConditionalLearnToPayAttention
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.