Low-Resolution Self-Attention for Semantic Segmentation
Yu-Huan Wu, Shi-Chen Zhang, Yun Liu, Le Zhang, Xin Zhan, Daquan Zhou, Jiashi Feng, Ming-Ming Cheng, Liangli Zhen

TL;DR
This paper introduces Low-Resolution Self-Attention (LRSA), a novel mechanism that captures global context efficiently for semantic segmentation by computing self-attention at a fixed low resolution, reducing computational costs while maintaining high performance.
Contribution
The paper proposes LRSA, a new self-attention mechanism that reduces computational complexity in vision transformers for segmentation tasks, and demonstrates its effectiveness with the LRFormer model.
Findings
LRFormer outperforms state-of-the-art models on multiple datasets.
LRSA significantly reduces FLOPs compared to traditional high-resolution self-attention.
The approach maintains high segmentation accuracy with lower computational cost.
Abstract
Semantic segmentation tasks naturally require high-resolution information for pixel-wise segmentation and global context information for class prediction. While existing vision transformers demonstrate promising performance, they often utilize high-resolution context modeling, resulting in a computational bottleneck. In this work, we challenge conventional wisdom and introduce the Low-Resolution Self-Attention (LRSA) mechanism to capture global context at a significantly reduced computational cost, i.e., FLOPs. Our approach involves computing self-attention in a fixed low-resolution space regardless of the input image's resolution, with additional 3x3 depth-wise convolutions to capture fine details in the high-resolution space. We demonstrate the effectiveness of our LRSA approach by building the LRFormer, a vision transformer with an encoder-decoder structure. Extensive experiments on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
MethodsMulti-Head Attention · Linear Layer · Residual Connection · Layer Normalization · Softmax · Attention Is All You Need · Dense Connections · Vision Transformer
