DeeperLab: Single-Shot Image Parser
Tien-Ju Yang, Maxwell D. Collins, Yukun Zhu, Jyh-Jing Hwang, Ting Liu,, Xiao Zhang, Vivienne Sze, George Papandreou, Liang-Chieh Chen

TL;DR
DeeperLab introduces a fully convolutional, single-shot approach for panoptic segmentation that jointly handles semantic and instance segmentation, achieving competitive accuracy and real-time processing speeds.
Contribution
It presents a novel, fully convolutional single-shot model for panoptic segmentation that simplifies the pipeline and improves processing speed compared to prior multi-stage methods.
Findings
Achieves 31.95% PQ on Mapillary Vistas dataset
Operates at near real-time speed of 22.6 fps on GPU
Introduces the region-based Parsing Covering metric
Abstract
We present a single-shot, bottom-up approach for whole image parsing. Whole image parsing, also known as Panoptic Segmentation, generalizes the tasks of semantic segmentation for 'stuff' classes and instance segmentation for 'thing' classes, assigning both semantic and instance labels to every pixel in an image. Recent approaches to whole image parsing typically employ separate standalone modules for the constituent semantic and instance segmentation tasks and require multiple passes of inference. Instead, the proposed DeeperLab image parser performs whole image parsing with a significantly simpler, fully convolutional approach that jointly addresses the semantic and instance segmentation tasks in a single-shot manner, resulting in a streamlined system that better lends itself to fast processing. For quantitative evaluation, we use both the instance-based Panoptic Quality (PQ) metric…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
Methodspc · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
