Kronecker Attention Networks

Hongyang Gao; Zhengyang Wang; Shuiwang Ji

arXiv:2007.08442·cs.CV·July 17, 2020

Kronecker Attention Networks

Hongyang Gao, Zhengyang Wang, Shuiwang Ji

PDF

3 Repos

TL;DR

This paper introduces Kronecker Attention Operators (KAOs) that operate directly on high-order tensor data using matrix-variate normal distributions, significantly reducing computational costs while maintaining or improving performance.

Contribution

The paper proposes a novel attention mechanism for high-order data that avoids flattening and leverages matrix-variate normal distributions, leading to substantial computational savings.

Findings

01

KAOs reduce computational resources by hundreds of times.

02

Networks with KAOs outperform non-attention models.

03

KAOs achieve competitive performance with original attention methods.

Abstract

Attention operators have been applied on both 1-D data like texts and higher-order data such as images and videos. Use of attention operators on high-order data requires flattening of the spatial or spatial-temporal dimensions into a vector, which is assumed to follow a multivariate normal distribution. This not only incurs excessive requirements on computational resources, but also fails to preserve structures in data. In this work, we propose to avoid flattening by assuming the data follow matrix-variate normal distributions. Based on this new view, we develop Kronecker attention operators (KAOs) that operate on high-order tensor data directly. More importantly, the proposed KAOs lead to dramatic reductions in computational resources. Experimental results show that our methods reduce the amount of required computational resources by a factor of hundreds, with larger factors for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.