TL;DR
This paper introduces a self-supervised monocular depth estimation method that uses distance transforms over pre-semantic contours to improve accuracy in low-texture areas, demonstrating superior results on multiple datasets.
Contribution
The novel integration of distance transforms over pre-semantic contours enhances spatial information and training effectiveness in self-supervised monocular depth estimation.
Findings
Outperforms existing self-supervised methods on KITTI, Cityscapes, Waymo, NYUv2, and ScanNet datasets.
Theoretically proves the optimality of distance transform for variance augmentation.
Improves depth and ego-motion estimation in low-texture regions.
Abstract
Monocular depth estimation (MDE) with self-supervised training approaches struggles in low-texture areas, where photometric losses may lead to ambiguous depth predictions. To address this, we propose a novel technique that enhances spatial information by applying a distance transform over pre-semantic contours, augmenting discriminative power in low texture regions. Our approach jointly estimates pre-semantic contours, depth and ego-motion. The pre-semantic contours are leveraged to produce new input images, with variance augmented by the distance transform in uniform areas. This approach results in more effective loss functions, enhancing the training process for depth and ego-motion. We demonstrate theoretically that the distance transform is the optimal variance-augmenting technique in this context. Through extensive experiments on KITTI, Cityscapes, Waymo, NYUv2 and ScanNet our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
