Predicting Deeper into the Future of Semantic Segmentation
Pauline Luc, Natalia Neverova, Camille Couprie, Jakob Verbeek, Yann, LeCun

TL;DR
This paper introduces a novel task of predicting future semantic segmentation maps in video frames, demonstrating that direct prediction outperforms RGB-based methods, with promising results up to half a second ahead.
Contribution
The paper presents the first approach for directly predicting future semantic segmentations using an autoregressive CNN, outperforming RGB-based prediction methods.
Findings
Direct segmentation prediction outperforms RGB prediction methods.
Prediction accuracy is significantly better than optical flow warping baseline.
Results are visually convincing up to half a second in the future.
Abstract
The ability to predict and therefore to anticipate the future is an important attribute of intelligence. It is also of utmost importance in real-time systems, e.g. in robotics or autonomous driving, which depend on visual scene understanding for decision making. While prediction of the raw RGB pixel values in future video frames has been studied in previous work, here we introduce the novel task of predicting semantic segmentations of future frames. Given a sequence of video frames, our goal is to predict segmentation maps of not yet observed video frames that lie up to a second or further in the future. We develop an autoregressive convolutional neural network that learns to iteratively generate multiple frames. Our results on the Cityscapes dataset show that directly predicting future segmentations is substantially better than predicting and then segmenting future RGB frames.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Visual Attention and Saliency Detection
