Variational Structured Attention Networks for Deep Visual Representation   Learning

Guanglei Yang; Paolo Rota; Xavier Alameda-Pineda; Dan Xu; Mingli Ding,; Elisa Ricci

arXiv:2103.03510·cs.CV·December 16, 2021

Variational Structured Attention Networks for Deep Visual Representation Learning

Guanglei Yang, Paolo Rota, Xavier Alameda-Pineda, Dan Xu, Mingli Ding,, Elisa Ricci

PDF

Open Access 1 Repo

TL;DR

This paper introduces VISTA-Net, a unified probabilistic framework that jointly learns structured spatial and channel attention for enhanced deep visual representation, significantly improving performance on various dense prediction tasks.

Contribution

It proposes a novel end-to-end trainable model that structures and models interactions between spatial and channel attentions within a probabilistic framework.

Findings

01

Outperforms state-of-the-art on six large-scale datasets

02

Effective joint learning of spatial and channel attentions

03

Improves accuracy in dense visual prediction tasks

Abstract

Convolutional neural networks have enabled major progresses in addressing pixel-level prediction tasks such as semantic segmentation, depth estimation, surface normal prediction and so on, benefiting from their powerful capabilities in visual representation learning. Typically, state of the art models integrate attention mechanisms for improved deep feature representations. Recently, some works have demonstrated the significance of learning and combining both spatial- and channelwise attentions for deep feature refinement. In this paper, weaim at effectively boosting previous approaches and propose a unified deep framework to jointly learn both spatial attention maps and channel attention vectors in a principled manner so as to structure the resulting attention tensors and model interactions between these two types of attentions. Specifically, we integrate the estimation and the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ygjwd12345/VISTA-Net
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Human Pose and Action Recognition