Joint Semantic and Motion Segmentation for dynamic scenes using Deep Convolutional Networks
Nazrul Haque, N Dinesh Reddy, K. Madhava Krishna

TL;DR
This paper presents a deep learning approach that combines semantic and motion features for monocular scene segmentation, improving dynamic scene understanding in outdoor robotic navigation without stereo data.
Contribution
It introduces a novel CNN-based method to fuse semantic and motion cues for monocular segmentation, incorporating optical flow and multi-scale context aggregation.
Findings
Significant improvement over state-of-the-art in KITTI dataset
Effective fusion of semantics and motion cues
Enhanced monocular dynamic scene segmentation
Abstract
Dynamic scene understanding is a challenging problem and motion segmentation plays a crucial role in solving it. Incorporating semantics and motion enhances the overall perception of the dynamic scene. For applications of outdoor robotic navigation, joint learning methods have not been extensively used for extracting spatio-temporal features or adding different priors into the formulation. The task becomes even more challenging without stereo information being incorporated. This paper proposes an approach to fuse semantic features and motion clues using CNNs, to address the problem of monocular semantic motion segmentation. We deduce semantic and motion labels by integrating optical flow as a constraint with semantic features into dilated convolution network. The pipeline consists of three main stages i.e Feature extraction, Feature amplification and Multi Scale Context Aggregation to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Video Surveillance and Tracking Methods · Human Pose and Action Recognition
