Axial Attention in Multidimensional Transformers

Jonathan Ho; Nal Kalchbrenner; Dirk Weissenborn; Tim Salimans

arXiv:1912.12180·cs.CV·December 30, 2019·364 cites

Axial Attention in Multidimensional Transformers

Jonathan Ho, Nal Kalchbrenner, Dirk Weissenborn, Tim Salimans

PDF

Open Access 3 Repos 2 Models 1 Video

TL;DR

The paper introduces Axial Transformers, a novel self-attention-based model for high-dimensional data that achieves state-of-the-art results while maintaining computational efficiency and ease of implementation.

Contribution

It presents axial attention, a new self-attention mechanism that scales efficiently to high-dimensional data and enables state-of-the-art generative modeling performance.

Findings

01

Achieved state-of-the-art results on ImageNet-32 and ImageNet-64 benchmarks.

02

Demonstrated effectiveness on the BAIR Robotic Pushing video benchmark.

03

Maintained full distribution expressiveness with a semi-parallel decoding structure.

Abstract

We propose Axial Transformers, a self-attention-based autoregressive model for images and other data organized as high dimensional tensors. Existing autoregressive models either suffer from excessively large computational resource requirements for high dimensional data, or make compromises in terms of distribution expressiveness or ease of implementation in order to decrease resource requirements. Our architecture, by contrast, maintains both full expressiveness over joint distributions over data and ease of implementation with standard deep learning frameworks, while requiring reasonable memory and computation and achieving state-of-the-art results on standard generative modeling benchmarks. Our models are based on axial attention, a simple generalization of self-attention that naturally aligns with the multiple dimensions of the tensors in both the encoding and the decoding settings.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

Axial Attention & MetNet: A Neural Weather Model for Precipitation Forecasting· youtube

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications · Medical Image Segmentation Techniques

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Axial Attention · Residual Connection · Byte Pair Encoding · Dense Connections · Label Smoothing · *Communicated@Fast*How Do I Communicate to Expedia? · Adam