Swin-Free: Achieving Better Cross-Window Attention and Efficiency with   Size-varying Window

Jinkyu Koo; John Yang; Le An; Gwenaelle Cunha Sergio; Su Inn Park

arXiv:2306.13776·cs.CV·June 27, 2023·1 cites

Swin-Free: Achieving Better Cross-Window Attention and Efficiency with Size-varying Window

Jinkyu Koo, John Yang, Le An, Gwenaelle Cunha Sergio, Su Inn Park

PDF

Open Access

TL;DR

Swin-Free introduces size-varying windows across stages in transformer models to improve cross-window connectivity and efficiency, outperforming Swin Transformer in speed and accuracy without shifting windows.

Contribution

The paper proposes a novel size-varying window approach that replaces shifting windows in Swin Transformer, enhancing efficiency and accuracy in vision tasks.

Findings

01

Swin-Free runs faster than Swin Transformer at inference.

02

Swin-Free achieves better accuracy than Swin Transformer.

03

Variants of Swin-Free are also faster than their Swin counterparts.

Abstract

Transformer models have shown great potential in computer vision, following their success in language tasks. Swin Transformer is one of them that outperforms convolution-based architectures in terms of accuracy, while improving efficiency when compared to Vision Transformer (ViT) and its variants, which have quadratic complexity with respect to the input size. Swin Transformer features shifting windows that allows cross-window connection while limiting self-attention computation to non-overlapping local windows. However, shifting windows introduces memory copy operations, which account for a significant portion of its runtime. To mitigate this issue, we propose Swin-Free in which we apply size-varying windows across stages, instead of shifting windows, to achieve cross-connection among local windows. With this simple design change, Swin-Free runs faster than the Swin Transformer at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications

MethodsMulti-Head Attention · Attention Is All You Need · Absolute Position Encodings · Linear Layer · Layer Normalization · Position-Wise Feed-Forward Layer · Stochastic Depth · Dense Connections · Label Smoothing · Adam