Improving Semantic Segmentation via Video Propagation and Label Relaxation
Yi Zhu, Karan Sapra, Fitsum A. Reda, Kevin J. Shih, Shawn Newsam,, Andrew Tao, Bryan Catanzaro

TL;DR
This paper introduces a video prediction-based data augmentation and label relaxation technique to improve semantic segmentation accuracy, achieving state-of-the-art results on multiple datasets.
Contribution
It proposes a novel joint propagation strategy and boundary label relaxation to enhance training robustness and segmentation performance.
Findings
Achieved 83.5% mIoU on Cityscapes
Surpassed 2018 ROB challenge winning entry on KITTI
Significant accuracy improvements with synthesized samples
Abstract
Semantic segmentation requires large amounts of pixel-wise annotations to learn accurate models. In this paper, we present a video prediction-based methodology to scale up training sets by synthesizing new training samples in order to improve the accuracy of semantic segmentation networks. We exploit video prediction models' ability to predict future frames in order to also predict future labels. A joint propagation strategy is also proposed to alleviate mis-alignments in synthesized samples. We demonstrate that training segmentation models on datasets augmented by the synthesized samples leads to significant improvements in accuracy. Furthermore, we introduce a novel boundary label relaxation technique that makes training robust to annotation noise and propagation artifacts along object boundaries. Our proposed methods achieve state-of-the-art mIoUs of 83.5% on Cityscapes and 82.9% on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
