Semantic-Aware Local-Global Vision Transformer

Jiatong Zhang; Zengwei Yao; Fanglin Chen; Guangming Lu; and Wenjie Pei

arXiv:2211.14705·cs.CV·November 29, 2022

Semantic-Aware Local-Global Vision Transformer

Jiatong Zhang, Zengwei Yao, Fanglin Chen, Guangming Lu, and Wenjie Pei

PDF

Open Access

TL;DR

The paper introduces SALG, a vision transformer that incorporates unsupervised semantic segmentation and local-global attention mechanisms, improving feature learning especially in small-scale models.

Contribution

SALG advances vision transformers by integrating semantic priors through unsupervised segmentation and combining local and global attention for enhanced feature representation.

Findings

01

Outperforms other vision Transformers on various tasks.

02

Excels particularly in small-scale model scenarios.

03

Demonstrates the effectiveness of semantic-aware local-global modeling.

Abstract

Vision Transformers have achieved remarkable progresses, among which Swin Transformer has demonstrated the tremendous potential of Transformer for vision tasks. It surmounts the key challenge of high computational complexity by performing local self-attention within shifted windows. In this work we propose the Semantic-Aware Local-Global Vision Transformer (SALG), to further investigate two potential improvements towards Swin Transformer. First, unlike Swin Transformer that performs uniform partition to produce equal size of regular windows for local self-attention, our SALG performs semantic segmentation in an unsupervised way to explore the underlying semantic priors in the image. As a result, each segmented region can correspond to a semantically meaningful part in the image, potentially leading to more effective features within each of segmented regions. Second, instead of only…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Visual Attention and Saliency Detection

MethodsMulti-Head Attention · Attention Is All You Need · Stochastic Depth · Softmax · Adam · Dropout · Byte Pair Encoding · Swin Transformer · Position-Wise Feed-Forward Layer · Label Smoothing