Self-supervised Learning of Occlusion Aware Flow Guided 3D Geometry Perception with Adaptive Cross Weighted Loss from Monocular Videos
Jiaojiao Fang, Guizhong Liu

TL;DR
This paper introduces a novel self-supervised approach for 3D scene understanding from monocular videos, effectively handling occlusions and moving objects through an adaptive cross-weighted loss and occlusion-aware optical flow guidance.
Contribution
It proposes a learnable occlusion mask with an occlusion-aware photometric loss and an adaptive cross-weighted loss to improve depth and pose estimation in dynamic scenes.
Findings
Achieves promising results on KITTI, Make3D, and Cityscapes datasets.
Demonstrates good generalization under challenging scenarios.
Effectively distinguishes moving objects from static scene assumptions.
Abstract
Self-supervised deep learning-based 3D scene understanding methods can overcome the difficulty of acquiring the densely labeled ground-truth and have made a lot of advances. However, occlusions and moving objects are still some of the major limitations. In this paper, we explore the learnable occlusion aware optical flow guided self-supervised depth and camera pose estimation by an adaptive cross weighted loss to address the above limitations. Firstly, we explore to train the learnable occlusion mask fused optical flow network by an occlusion-aware photometric loss with the temporally supplemental information and backward-forward consistency of adjacent views. And then, we design an adaptive cross-weighted loss between the depth-pose and optical flow loss of the geometric and photometric error to distinguish the moving objects which violate the static scene assumption. Our method shows…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Robotics and Sensor-Based Localization · Video Surveillance and Tracking Methods
MethodsAttentive Walk-Aggregating Graph Neural Network
