Scaling Local Self-Attention for Parameter Efficient Visual Backbones

Ashish Vaswani; Prajit Ramachandran; Aravind Srinivas; Niki Parmar,; Blake Hechtman; Jonathon Shlens

arXiv:2103.12731·cs.CV·June 8, 2021

Scaling Local Self-Attention for Parameter Efficient Visual Backbones

Ashish Vaswani, Prajit Ramachandran, Aravind Srinivas, Niki Parmar,, Blake Hechtman, Jonathon Shlens

PDF

5 Repos 8 Models

TL;DR

This paper introduces HaloNets, a new self-attention model family optimized for efficiency and accuracy, outperforming traditional convolutional models on ImageNet and other vision tasks.

Contribution

The paper develops two extensions to self-attention and an efficient implementation, creating HaloNets that achieve state-of-the-art results in parameter-limited settings.

Findings

01

HaloNets achieve top accuracy on ImageNet with fewer parameters.

02

HaloNets outperform larger models in transfer learning tasks.

03

Local self-attention hybrids improve object detection and segmentation results.

Abstract

Self-attention has the promise of improving computer vision systems due to parameter-independent scaling of receptive fields and content-dependent interactions, in contrast to parameter-dependent scaling and content-independent interactions of convolutions. Self-attention models have recently been shown to have encouraging improvements on accuracy-parameter trade-offs compared to baseline convolutional models such as ResNet-50. In this work, we aim to develop self-attention models that can outperform not just the canonical baseline models, but even the high-performing convolutional models. We propose two extensions to self-attention that, in conjunction with a more efficient implementation of self-attention, improve the speed, memory usage, and accuracy of these models. We leverage these improvements to develop a new self-attention model family, HaloNets, which reach state-of-the-art…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsHaloNet