MCDS-VSS: Moving Camera Dynamic Scene Video Semantic Segmentation by   Filtering with Self-Supervised Geometry and Motion

Angel Villar-Corrales; Moritz Austermann; Sven Behnke

arXiv:2405.19921·cs.CV·September 6, 2024

MCDS-VSS: Moving Camera Dynamic Scene Video Semantic Segmentation by Filtering with Self-Supervised Geometry and Motion

Angel Villar-Corrales, Moritz Austermann, Sven Behnke

PDF

Open Access 1 Repo

TL;DR

MCDS-VSS introduces a self-supervised structured filtering approach that estimates scene geometry, ego-motion, and object motion to enhance temporal consistency in video semantic segmentation for autonomous systems.

Contribution

The paper presents a novel self-supervised model that explicitly learns interpretable scene representations to improve temporal consistency in video segmentation.

Findings

01

Achieves superior temporal consistency in automotive video sequences.

02

Maintains competitive semantic segmentation accuracy.

03

Effectively decouples scene geometry, ego-motion, and object motion.

Abstract

Autonomous systems, such as self-driving cars, rely on reliable semantic environment perception for decision making. Despite great advances in video semantic segmentation, existing approaches ignore important inductive biases and lack structured and interpretable internal representations. In this work, we propose MCDS-VSS, a structured filter model that learns in a self-supervised manner to estimate scene geometry and ego-motion of the camera, while also estimating the motion of external objects. Our model leverages these representations to improve the temporal consistency of semantic segmentation without sacrificing segmentation accuracy. MCDS-VSS follows a prediction-fusion approach in which scene geometry and camera motion are first used to compensate for ego-motion, then residual flow is used to compensate motion of dynamic objects, and finally the predicted scene features are fused…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

angelvillar96/MCDS-VSS
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Analysis and Summarization · Advanced Vision and Imaging · Advanced Image and Video Retrieval Techniques