NimbleD: Enhancing Self-supervised Monocular Depth Estimation with Pseudo-labels and Large-scale Video Pre-training
Albert Luginov, Muhammad Shahzad

TL;DR
NimbleD is a fast, lightweight self-supervised monocular depth estimation framework that uses pseudo-labels from a large vision model and large-scale video pre-training, achieving high performance without camera intrinsics.
Contribution
It introduces a novel, efficient self-supervised learning approach that leverages pseudo-labels and large-scale video pre-training, eliminating the need for camera intrinsics.
Findings
Achieves performance comparable to state-of-the-art models
Enables large-scale pre-training on publicly available videos
Maintains low latency suitable for AR/VR applications
Abstract
We introduce NimbleD, an efficient self-supervised monocular depth estimation learning framework that incorporates supervision from pseudo-labels generated by a large vision model. This framework does not require camera intrinsics, enabling large-scale pre-training on publicly available videos. Our straightforward yet effective learning strategy significantly enhances the performance of fast and lightweight models without introducing any overhead, allowing them to achieve performance comparable to state-of-the-art self-supervised monocular depth estimation models. This advancement is particularly beneficial for virtual and augmented reality applications requiring low latency inference. The source code, model weights, and acknowledgments are available at https://github.com/xapaxca/nimbled .
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Image Processing Techniques and Applications · Advanced Image Processing Techniques
