TL;DR
This paper introduces a self-supervised method for monocular depth estimation from aerial imagery that does not require annotated data, using only image sequences to learn depth and pose, suitable for real-time applications.
Contribution
The paper presents a novel self-supervised approach for aerial monocular depth estimation that jointly learns depth and pose from single-camera sequences, enabling real-time processing without annotated data.
Findings
Achieves up to 93.5% accuracy at δ1.25
Demonstrates good generalization to unseen data
Provides useful initializations for image matching methods
Abstract
Supervised learning based methods for monocular depth estimation usually require large amounts of extensively annotated training data. In the case of aerial imagery, this ground truth is particularly difficult to acquire. Therefore, in this paper, we present a method for self-supervised learning for monocular depth estimation from aerial imagery that does not require annotated training data. For this, we only use an image sequence from a single moving camera and learn to simultaneously estimate depth and pose information. By sharing the weights between pose and depth estimation, we achieve a relatively small model, which favors real-time application. We evaluate our approach on three diverse datasets and compare the results to conventional methods that estimate depth maps based on multi-view geometry. We achieve an accuracy {\delta}1.25 of up to 93.5 %. In addition, we have paid…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
