Dilated SpineNet for Semantic Segmentation
Abdullah Rashwan, Xianzhi Du, Xiaoqi Yin, Jing Li

TL;DR
This paper introduces SpineNet-Seg, a NAS-discovered, scale-permuted network with dilated convolutions that significantly improves semantic segmentation accuracy across multiple benchmarks, including Cityscapes and PASCAL VOC2012.
Contribution
The paper proposes SpineNet-Seg, a novel NAS-designed, scale-permuted network with customized dilation ratios for semantic segmentation, outperforming existing baselines.
Findings
Achieves 83.04% mIoU on Cityscapes
Attains 85.56% mIoU on PASCAL VOC2012
Outperforms DeepLabv3/v3+ baselines in speed and accuracy
Abstract
Scale-permuted networks have shown promising results on object bounding box detection and instance segmentation. Scale permutation and cross-scale fusion of features enable the network to capture multi-scale semantics while preserving spatial resolution. In this work, we evaluate this meta-architecture design on semantic segmentation - another vision task that benefits from high spatial resolution and multi-scale feature fusion at different network stages. By further leveraging dilated convolution operations, we propose SpineNet-Seg, a network discovered by NAS that is searched from the DeepLabv3 system. SpineNet-Seg is designed with a better scale-permuted network topology with customized dilation ratios per block on a semantic segmentation task. SpineNet-Seg models outperform the DeepLabv3/v3+ baselines at all model scales on multiple popular benchmarks in speed and accuracy. In…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Automated Road and Building Extraction · Video Surveillance and Tracking Methods
Methods1x1 Convolution · Spatial Pyramid Pooling · Batch Normalization · Atrous Spatial Pyramid Pooling · Convolution · DeepLabv3 · Dilated Convolution
