Low-Resolution Self-Attention for Semantic Segmentation

Yu-Huan Wu; Shi-Chen Zhang; Yun Liu; Le Zhang; Xin Zhan; Daquan Zhou; Jiashi Feng; Ming-Ming Cheng; Liangli Zhen

arXiv:2310.05026·cs.CV·June 17, 2025·2 cites

Low-Resolution Self-Attention for Semantic Segmentation

Yu-Huan Wu, Shi-Chen Zhang, Yun Liu, Le Zhang, Xin Zhan, Daquan Zhou, Jiashi Feng, Ming-Ming Cheng, Liangli Zhen

PDF

Open Access 1 Repo

TL;DR

This paper introduces Low-Resolution Self-Attention (LRSA), a novel mechanism that captures global context efficiently for semantic segmentation by computing self-attention at a fixed low resolution, reducing computational costs while maintaining high performance.

Contribution

The paper proposes LRSA, a new self-attention mechanism that reduces computational complexity in vision transformers for segmentation tasks, and demonstrates its effectiveness with the LRFormer model.

Findings

01

LRFormer outperforms state-of-the-art models on multiple datasets.

02

LRSA significantly reduces FLOPs compared to traditional high-resolution self-attention.

03

The approach maintains high segmentation accuracy with lower computational cost.

Abstract

Semantic segmentation tasks naturally require high-resolution information for pixel-wise segmentation and global context information for class prediction. While existing vision transformers demonstrate promising performance, they often utilize high-resolution context modeling, resulting in a computational bottleneck. In this work, we challenge conventional wisdom and introduce the Low-Resolution Self-Attention (LRSA) mechanism to capture global context at a significantly reduced computational cost, i.e., FLOPs. Our approach involves computing self-attention in a fixed low-resolution space regardless of the input image's resolution, with additional 3x3 depth-wise convolutions to capture fine details in the high-resolution space. We demonstrate the effectiveness of our LRSA approach by building the LRFormer, a vision transformer with an encoder-decoder structure. Extensive experiments on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yuhuan-wu/lrformer
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications

MethodsMulti-Head Attention · Linear Layer · Residual Connection · Layer Normalization · Softmax · Attention Is All You Need · Dense Connections · Vision Transformer