Multi-Scale Representations by Varying Window Attention for Semantic Segmentation
Haotian Yan, Ming Wu, Chuang Zhang

TL;DR
This paper introduces VWA, a novel multi-scale attention mechanism that effectively captures multi-scale features in semantic segmentation without extra computational cost, and proposes VWFormer, a new decoder that outperforms existing methods.
Contribution
The paper presents VWA, a scale-varying window attention method, and VWFormer, a multi-scale decoder, advancing multi-scale learning in semantic segmentation with efficiency and improved accuracy.
Findings
VWA effectively captures multi-scale features without increasing computational cost.
VWFormer outperforms existing decoders like FPN and MLP with less computation.
The approach achieves 1.0%-2.5% higher mIoU on ADE20K compared to UPerNet.
Abstract
Multi-scale learning is central to semantic segmentation. We visualize the effective receptive field (ERF) of canonical multi-scale representations and point out two risks in learning them: scale inadequacy and field inactivation. A novel multi-scale learner, varying window attention (VWA), is presented to address these issues. VWA leverages the local window attention (LWA) and disentangles LWA into the query window and context window, allowing the context's scale to vary for the query to learn representations at multiple scales. However, varying the context to large-scale windows (enlarging ratio R) can significantly increase the memory footprint and computation cost (R^2 times larger than LWA). We propose a simple but professional re-scaling strategy to zero the extra induced cost without compromising performance. Consequently, VWA uses the same cost as LWA to overcome the receptive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Semantic Web and Ontologies · Image Retrieval and Classification Techniques
Methods1x1 Convolution · Convolution · Feature Pyramid Network
