Robust Monocular Localization of Drones by Adapting Domain Maps to Depth Prediction Inaccuracies
Priyesh Shukla, Sureshkumar S., Alex C. Stutts, Sathya Ravi, Theja, Tulabandhula, and Amit R. Trivedi

TL;DR
This paper introduces a monocular drone localization method that combines deep learning-based depth prediction with Bayesian filtering, achieving high accuracy and robustness in challenging environments without extensive domain adaptation.
Contribution
It proposes a cross-modal framework that jointly trains depth prediction and pose reasoning, improving scalability and environmental robustness over deep learning-only methods.
Findings
Maintains pose accuracy with poor depth estimates from lightweight predictors.
Performs well under extreme lighting conditions without explicit domain adaptation.
Enables faster updates and resource-efficient reuse of intermediate predictions.
Abstract
We present a novel monocular localization framework by jointly training deep learning-based depth prediction and Bayesian filtering-based pose reasoning. The proposed cross-modal framework significantly outperforms deep learning-only predictions with respect to model scalability and tolerance to environmental variations. Specifically, we show little-to-no degradation of pose accuracy even with extremely poor depth estimates from a lightweight depth predictor. Our framework also maintains high pose accuracy in extreme lighting variations compared to standard deep learning, even without explicit domain adaptation. By openly representing the map and intermediate feature maps (such as depth estimates), our framework also allows for faster updates and reusing intermediate predictions for other tasks, such as obstacle avoidance, resulting in much higher resource efficiency.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Advanced Vision and Imaging · Human Pose and Action Recognition
