Vision Foundation Models for Domain Generalisable Cross-View Localisation in Planetary Ground-Aerial Robotic Teams

Lachlan Holden; Feras Dayoub; Alberto Candela; David Harvey; Tat-Jun Chin

arXiv:2601.09107·cs.CV·April 30, 2026

Vision Foundation Models for Domain Generalisable Cross-View Localisation in Planetary Ground-Aerial Robotic Teams

Lachlan Holden, Feras Dayoub, Alberto Candela, David Harvey, Tat-Jun Chin

PDF

TL;DR

This paper introduces a novel deep learning approach for planetary rover localisation in aerial maps, utilizing semantic segmentation, synthetic data, and particle filters for improved accuracy in challenging environments.

Contribution

It proposes a dual-encoder neural network leveraging foundation models and synthetic data to bridge the domain gap for cross-view localisation in planetary robotics.

Findings

01

Achieves accurate rover localisation over complex trajectories.

02

Utilizes synthetic data and semantic segmentation to improve real-world performance.

03

Provides a new dataset of real-world rover trajectories and synthetic image pairs.

Abstract

Accurate localisation in planetary robotics enables the advanced autonomy required to support the increased scale and scope of future missions. The successes of the Ingenuity helicopter and multiple planetary orbiters lay the groundwork for future missions that use ground-aerial robotic teams. In this paper, we consider rovers using machine learning to localise themselves in a local aerial map using limited field-of-view monocular ground-view RGB images as input. A key consideration for machine learning methods is that real space data with ground-truth position labels suitable for training is scarce. In this work, we propose a novel method of localising rovers in an aerial map using cross-view-localising dual-encoder deep neural networks. We leverage semantic segmentation with vision foundation models and high volume synthetic data to bridge the domain gap to real images. We also…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.