DeeperLab: Single-Shot Image Parser

Tien-Ju Yang; Maxwell D. Collins; Yukun Zhu; Jyh-Jing Hwang; Ting Liu,; Xiao Zhang; Vivienne Sze; George Papandreou; Liang-Chieh Chen

arXiv:1902.05093·cs.CV·March 14, 2019·162 cites

DeeperLab: Single-Shot Image Parser

Tien-Ju Yang, Maxwell D. Collins, Yukun Zhu, Jyh-Jing Hwang, Ting Liu,, Xiao Zhang, Vivienne Sze, George Papandreou, Liang-Chieh Chen

PDF

Open Access

TL;DR

DeeperLab introduces a fully convolutional, single-shot approach for panoptic segmentation that jointly handles semantic and instance segmentation, achieving competitive accuracy and real-time processing speeds.

Contribution

It presents a novel, fully convolutional single-shot model for panoptic segmentation that simplifies the pipeline and improves processing speed compared to prior multi-stage methods.

Findings

01

Achieves 31.95% PQ on Mapillary Vistas dataset

02

Operates at near real-time speed of 22.6 fps on GPU

03

Introduces the region-based Parsing Covering metric

Abstract

We present a single-shot, bottom-up approach for whole image parsing. Whole image parsing, also known as Panoptic Segmentation, generalizes the tasks of semantic segmentation for 'stuff' classes and instance segmentation for 'thing' classes, assigning both semantic and instance labels to every pixel in an image. Recent approaches to whole image parsing typically employ separate standalone modules for the constituent semantic and instance segmentation tasks and require multiple passes of inference. Instead, the proposed DeeperLab image parser performs whole image parsing with a significantly simpler, fully convolutional approach that jointly addresses the semantic and instance segmentation tasks in a single-shot manner, resulting in a streamlined system that better lends itself to fast processing. For quantitative evaluation, we use both the instance-based Panoptic Quality (PQ) metric…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning

Methodspc · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings