MCDS-VSS: Moving Camera Dynamic Scene Video Semantic Segmentation by Filtering with Self-Supervised Geometry and Motion
Angel Villar-Corrales, Moritz Austermann, Sven Behnke

TL;DR
MCDS-VSS introduces a self-supervised structured filtering approach that estimates scene geometry, ego-motion, and object motion to enhance temporal consistency in video semantic segmentation for autonomous systems.
Contribution
The paper presents a novel self-supervised model that explicitly learns interpretable scene representations to improve temporal consistency in video segmentation.
Findings
Achieves superior temporal consistency in automotive video sequences.
Maintains competitive semantic segmentation accuracy.
Effectively decouples scene geometry, ego-motion, and object motion.
Abstract
Autonomous systems, such as self-driving cars, rely on reliable semantic environment perception for decision making. Despite great advances in video semantic segmentation, existing approaches ignore important inductive biases and lack structured and interpretable internal representations. In this work, we propose MCDS-VSS, a structured filter model that learns in a self-supervised manner to estimate scene geometry and ego-motion of the camera, while also estimating the motion of external objects. Our model leverages these representations to improve the temporal consistency of semantic segmentation without sacrificing segmentation accuracy. MCDS-VSS follows a prediction-fusion approach in which scene geometry and camera motion are first used to compensate for ego-motion, then residual flow is used to compensate motion of dynamic objects, and finally the predicted scene features are fused…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Analysis and Summarization · Advanced Vision and Imaging · Advanced Image and Video Retrieval Techniques
