Improving Semantic Segmentation through Spatio-Temporal Consistency   Learned from Videos

Ankita Pasad; Ariel Gordon; Tsung-Yi Lin; Anelia Angelova

arXiv:2004.05324·cs.CV·May 22, 2020·1 cites

Improving Semantic Segmentation through Spatio-Temporal Consistency Learned from Videos

Ankita Pasad, Ariel Gordon, Tsung-Yi Lin, Anelia Angelova

PDF

Open Access

TL;DR

This paper introduces a method that uses unsupervised learning of 3D geometry and motion from videos to improve semantic segmentation accuracy and reduce labeling requirements by enforcing spatio-temporal consistency.

Contribution

It proposes leveraging depth, egomotion, and camera intrinsics to provide additional supervision for segmentation models, enhancing performance and efficiency.

Findings

01

Significant improvement in segmentation quality on ScanNet dataset

02

Reduced need for labeled data in training segmentation models

03

Effective enforcement of 3D-geometric and temporal consistency

Abstract

We leverage unsupervised learning of depth, egomotion, and camera intrinsics to improve the performance of single-image semantic segmentation, by enforcing 3D-geometric and temporal consistency of segmentation masks across video frames. The predicted depth, egomotion, and camera intrinsics are used to provide an additional supervision signal to the segmentation model, significantly enhancing its quality, or, alternatively, reducing the number of labels the segmentation model needs. Our experiments were performed on the ScanNet dataset.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Image Enhancement Techniques · Image Processing Techniques and Applications