Fusion of stereo and still monocular depth estimates in a self-supervised learning context
Diogo Martins, Kevin van Hecke, Guido de Croon

TL;DR
This paper presents a self-supervised learning approach where a CNN is trained to improve depth estimation by fusing stereo and monocular estimates, resulting in more reliable depth maps for autonomous robots.
Contribution
It introduces a novel fusion method that combines stereo and CNN-based monocular depth estimates, enhancing depth accuracy over stereo alone in a self-supervised setting.
Findings
Fused depth estimates outperform stereo estimates alone.
The method improves depth reliability for autonomous navigation.
Experiments on KITTI and a Parrot SLAMDunk validate the approach.
Abstract
We study how autonomous robots can learn by themselves to improve their depth estimation capability. In particular, we investigate a self-supervised learning setup in which stereo vision depth estimates serve as targets for a convolutional neural network (CNN) that transforms a single still image to a dense depth map. After training, the stereo and mono estimates are fused with a novel fusion method that preserves high confidence stereo estimates, while leveraging the CNN estimates in the low-confidence regions. The main contribution of the article is that it is shown that the fused estimates lead to a higher performance than the stereo vision estimates alone. Experiments are performed on the KITTI dataset, and on board of a Parrot SLAMDunk, showing that even rather limited CNNs can help provide stereo vision equipped robots with more reliable depth maps for autonomous navigation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
