Sparse Spatial Attention Network for Semantic Segmentation
Mengyu Liu, Hujun Yin

TL;DR
This paper introduces SSANet, a sparse spatial attention network that efficiently captures long-range dependencies for semantic segmentation, achieving state-of-the-art results by sampling key elements adaptively.
Contribution
The paper proposes a sparse non-local block that improves efficiency of spatial attention by sampling key elements, maintaining performance while reducing computational cost.
Findings
Outperforms other context aggregation methods
Achieves state-of-the-art on Cityscapes, PASCAL Context, ADE20K
Efficient long-range dependency modeling
Abstract
The spatial attention mechanism captures long-range dependencies by aggregating global contextual information to each query location, which is beneficial for semantic segmentation. In this paper, we present a sparse spatial attention network (SSANet) to improve the efficiency of the spatial attention mechanism without sacrificing the performance. Specifically, a sparse non-local (SNL) block is proposed to sample a subset of key and value elements for each query element to capture long-range relations adaptively and generate a sparse affinity matrix to aggregate contextual information efficiently. Experimental results show that the proposed approach outperforms other context aggregation methods and achieves state-of-the-art performance on the Cityscapes, PASCAL Context and ADE20K datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Advanced Neural Network Applications · Remote-Sensing Image Classification
