Dilated Neighborhood Attention Transformer
Ali Hassani, Humphrey Shi

TL;DR
This paper introduces Dilated Neighborhood Attention (DiNA) and a hierarchical vision transformer, DiNAT, which effectively expand receptive fields and improve performance across multiple vision tasks without additional computational cost.
Contribution
The paper proposes DiNA as an extension to Neighborhood Attention, enabling global context capture, and introduces DiNAT, a hierarchical transformer that combines local and global attention for superior performance.
Findings
DiNAT outperforms strong baselines like NAT, Swin, and ConvNeXt.
Large DiNAT achieves state-of-the-art results on COCO, ADE20K, and Cityscapes benchmarks.
DiNAT is faster and more accurate than comparable models.
Abstract
Transformers are quickly becoming one of the most heavily applied deep learning architectures across modalities, domains, and tasks. In vision, on top of ongoing efforts into plain transformers, hierarchical transformers have also gained significant attention, thanks to their performance and easy integration into existing frameworks. These models typically employ localized attention mechanisms, such as the sliding-window Neighborhood Attention (NA) or Swin Transformer's Shifted Window Self Attention. While effective at reducing self attention's quadratic complexity, local attention weakens two of the most desirable properties of self attention: long range inter-dependency modeling, and global receptive field. In this paper, we introduce Dilated Neighborhood Attention (DiNA), a natural, flexible and efficient extension to NA that can capture more global context and expand receptive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗shi-labs/dinat-mini-in1k-224model· 107 dl· ♡ 1107 dl♡ 1
- 🤗shi-labs/dinat-small-in1k-224model· 8 dl8 dl
- 🤗shi-labs/dinat-base-in1k-224model· 8 dl8 dl
- 🤗shi-labs/dinat-large-in22k-in1k-224model· 6 dl6 dl
- 🤗shi-labs/dinat-large-in22k-in1k-384model· 8 dl8 dl
- 🤗shi-labs/dinat-large-11x11-in22k-in1k-384model· 7 dl7 dl
- 🤗shi-labs/dinat-tiny-in1k-224model· 7 dl7 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Brain Tumor Detection and Classification
MethodsConvNeXt · Transformer · Neighborhood Attention
