Dilated Neighborhood Attention Transformer

Ali Hassani; Humphrey Shi

arXiv:2209.15001·cs.CV·January 18, 2023·28 cites

Dilated Neighborhood Attention Transformer

Ali Hassani, Humphrey Shi

PDF

Open Access 5 Repos 7 Models

TL;DR

This paper introduces Dilated Neighborhood Attention (DiNA) and a hierarchical vision transformer, DiNAT, which effectively expand receptive fields and improve performance across multiple vision tasks without additional computational cost.

Contribution

The paper proposes DiNA as an extension to Neighborhood Attention, enabling global context capture, and introduces DiNAT, a hierarchical transformer that combines local and global attention for superior performance.

Findings

01

DiNAT outperforms strong baselines like NAT, Swin, and ConvNeXt.

02

Large DiNAT achieves state-of-the-art results on COCO, ADE20K, and Cityscapes benchmarks.

03

DiNAT is faster and more accurate than comparable models.

Abstract

Transformers are quickly becoming one of the most heavily applied deep learning architectures across modalities, domains, and tasks. In vision, on top of ongoing efforts into plain transformers, hierarchical transformers have also gained significant attention, thanks to their performance and easy integration into existing frameworks. These models typically employ localized attention mechanisms, such as the sliding-window Neighborhood Attention (NA) or Swin Transformer's Shifted Window Self Attention. While effective at reducing self attention's quadratic complexity, local attention weakens two of the most desirable properties of self attention: long range inter-dependency modeling, and global receptive field. In this paper, we introduce Dilated Neighborhood Attention (DiNA), a natural, flexible and efficient extension to NA that can capture more global context and expand receptive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Brain Tumor Detection and Classification

MethodsConvNeXt · Transformer · Neighborhood Attention