TL;DR
This paper introduces a scale-invariant self-supervised monocular depth estimation method that detaches scale-sensitive features and boosts scale-invariant features, leading to improved robustness and state-of-the-art accuracy on KITTI.
Contribution
It proposes a novel scale-invariant approach with data augmentation and a cross-attention module to enhance depth estimation under scale variations.
Findings
Achieves new state-of-the-art performance on KITTI dataset
Effective detaching of scale-sensitive features improves robustness
Boosting scale-invariant features enhances accuracy
Abstract
Monocular depth estimation (MDE) in the self-supervised scenario has emerged as a promising method as it refrains from the requirement of ground truth depth. Despite continuous efforts, MDE is still sensitive to scale changes especially when all the training samples are from one single camera. Meanwhile, it deteriorates further since camera movement results in heavy coupling between the predicted depth and the scale change. In this paper, we present a scale-invariant approach for self-supervised MDE, in which scale-sensitive features (SSFs) are detached away while scale-invariant features (SIFs) are boosted further. To be specific, a simple but effective data augmentation by imitating the camera zooming process is proposed to detach SSFs, making the model robust to scale changes. Besides, a dynamic cross-attention module is designed to boost SIFs by fusing multi-scale cross-attention…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSoftmax · Concatenated Skip Connection
