Image Masking for Robust Self-Supervised Monocular Depth Estimation
Hemang Chawla, Kishaan Jeeveswaran, Elahe Arani, Bahram Zonooz

TL;DR
MIMDepth introduces masked image modeling to enhance the robustness of self-supervised monocular depth estimation against various corruptions and adversarial attacks, improving reliability in real-world scenarios.
Contribution
This paper adapts masked image modeling for direct training of monocular depth estimation, significantly improving robustness to noise, occlusions, and adversarial attacks.
Findings
MIMDepth outperforms existing methods under various corruptions.
The approach enhances robustness to adversarial attacks.
It maintains accurate depth estimation in challenging conditions.
Abstract
Self-supervised monocular depth estimation is a salient task for 3D scene understanding. Learned jointly with monocular ego-motion estimation, several methods have been proposed to predict accurate pixel-wise depth without using labeled data. Nevertheless, these methods focus on improving performance under ideal conditions without natural or digital corruptions. The general absence of occlusions is assumed even for object-specific depth estimation. These methods are also vulnerable to adversarial attacks, which is a pertinent concern for their reliable deployment in robots and autonomous driving systems. We propose MIMDepth, a method that adapts masked image modeling (MIM) for self-supervised monocular depth estimation. While MIM has been used to learn generalizable features during pre-training, we show how it could be adapted for direct training of monocular depth estimation. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Image Processing Techniques and Applications · Advanced Image Processing Techniques
MethodsMutual Information Machine/Mask Image Modeling
