STDC-MA Network for Semantic Segmentation
Xiaochun Lei, Linjun Lu, Zetao Jiang, Zhaoting Gong, Chang Lu, Jiaming, Liang

TL;DR
The paper introduces the STDC-MA network, combining lightweight structure and multiscale attention to enhance semantic segmentation accuracy, especially for small objects, while maintaining high speed.
Contribution
It proposes a novel STDC-MA network integrating feature alignment and hierarchical multiscale attention for improved segmentation performance.
Findings
Achieved 76.81% mIOU on Cityscapes with 0.5x input scale.
Improved small object segmentation accuracy.
Maintained high segmentation speed.
Abstract
Semantic segmentation is applied extensively in autonomous driving and intelligent transportation with methods that highly demand spatial and semantic information. Here, an STDC-MA network is proposed to meet these demands. First, the STDC-Seg structure is employed in STDC-MA to ensure a lightweight and efficient structure. Subsequently, the feature alignment module (FAM) is applied to understand the offset between high-level and low-level features, solving the problem of pixel offset related to upsampling on the high-level feature map. Our approach implements the effective fusion between high-level features and low-level features. A hierarchical multiscale attention mechanism is adopted to reveal the relationship among attention regions from two different input sizes of one image. Through this relationship, regions receiving much attention are integrated into the segmentation results,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Visual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
