Cross-View Image Sequence Geo-localization
Xiaohan Zhang, Waqas Sultani, Safwan Wshah

TL;DR
This paper introduces a novel end-to-end cross-view geo-localization method that utilizes sequences of limited FOV ground images with attention-based temporal aggregation, outperforming existing approaches.
Contribution
It is the first to handle sequences of limited FOV images for cross-view geo-localization, incorporating temporal attention and dropout techniques for robustness.
Findings
Outperforms several baseline methods in accuracy.
Introduces a new large-scale dataset for evaluation.
Demonstrates robustness to sequence length and GPS noise.
Abstract
Cross-view geo-localization aims to estimate the GPS location of a query ground-view image by matching it to images from a reference database of geo-tagged aerial images. To address this challenging problem, recent approaches use panoramic ground-view images to increase the range of visibility. Although appealing, panoramic images are not readily available compared to the videos of limited Field-Of-View (FOV) images. In this paper, we present the first cross-view geo-localization method that works on a sequence of limited FOV images. Our model is trained end-to-end to capture the temporal structure that lies within the frames using the attention-based temporal feature aggregation module. To robustly tackle different sequences length and GPS noises during inference, we propose to use a sequential dropout scheme to simulate variant length sequences. To evaluate the proposed approach in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Cross-View Image Sequence Geo-localization· youtube
Taxonomy
TopicsRobotics and Sensor-Based Localization · Advanced Image and Video Retrieval Techniques · Video Surveillance and Tracking Methods
MethodsDropout · Greedy Policy Search
