Video Panoptic Segmentation
Dahun Kim, Sanghyun Woo, Joon-Young Lee, In So Kweon

TL;DR
This paper introduces video panoptic segmentation, a new task combining semantic and instance segmentation across video frames, along with datasets, a novel network, and evaluation metrics, achieving state-of-the-art results.
Contribution
It proposes the first video panoptic segmentation task, creates new datasets, and develops VPSNet, a model that jointly predicts segmentation and tracking in videos.
Findings
Effective datasets for video panoptic segmentation
VPSNet achieves state-of-the-art results on Cityscapes and VIPER
Proposed VPQ metric evaluates segmentation and tracking quality
Abstract
Panoptic segmentation has become a new standard of visual recognition task by unifying previous semantic segmentation and instance segmentation tasks in concert. In this paper, we propose and explore a new video extension of this task, called video panoptic segmentation. The task requires generating consistent panoptic segmentation as well as an association of instance ids across video frames. To invigorate research on this new task, we present two types of video panoptic datasets. The first is a re-organization of the synthetic VIPER dataset into the video panoptic format to exploit its large-scale pixel annotations. The second is a temporal extension on the Cityscapes val. set, by providing new video panoptic annotations (Cityscapes-VPS). Moreover, we propose a novel video panoptic segmentation network (VPSNet) which jointly predicts object classes, bounding boxes, masks, instance id…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Video Panoptic Segmentation· youtube
Taxonomy
TopicsVisual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques · Advanced Neural Network Applications
MethodsVideo Panoptic Segmentation Network
