Simple and Efficient Architectures for Semantic Segmentation
Dushyant Mehta, Andrii Skliar, Haitam Ben Yahia, Shubhankar Borse,, Fatih Porikli, Amirhossein Habibian, Tijmen Blankevoort

TL;DR
This paper demonstrates that simple encoder-decoder architectures with modified ResNet backbones can match or outperform complex models in semantic segmentation, offering efficient and practical solutions for both desktop and mobile applications.
Contribution
The authors introduce simple, efficient encoder-decoder architectures with enlarged receptive fields that outperform complex models like HRNet, using minor modifications to ResNet backbones.
Findings
Simple architectures match or surpass complex models on Cityscapes.
Enlarging receptive fields with minor modifications improves segmentation performance.
Proposed models are suitable for both desktop and mobile devices.
Abstract
Though the state-of-the architectures for semantic segmentation, such as HRNet, demonstrate impressive accuracy, the complexity arising from their salient design choices hinders a range of model acceleration tools, and further they make use of operations that are inefficient on current hardware. This paper demonstrates that a simple encoder-decoder architecture with a ResNet-like backbone and a small multi-scale head, performs on-par or better than complex semantic segmentation architectures such as HRNet, FANet and DDRNets. Naively applying deep backbones designed for Image Classification to the task of Semantic Segmentation leads to sub-par results, owing to a much smaller effective receptive field of these backbones. Implicit among the various design choices put forth in works like HRNet, DDRNet, and FANet are networks with a large effective receptive field. It is natural to ask if a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Visual Attention and Saliency Detection
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Residual Connection · Batch Normalization · Convolution · HRNet
