TL;DR
This paper introduces OCGNet, a novel object-level cross-view geo-localization network that leverages location cues, enhancement modules, and multi-head attention to improve accuracy and generalization in challenging scenarios.
Contribution
It proposes a new network architecture that integrates location information and attention mechanisms for precise object-level geo-localization across views.
Findings
Achieves state-of-the-art results on CVOGL dataset.
Demonstrates effective few-shot learning capabilities.
Robustly localizes objects despite viewpoint and condition variations.
Abstract
Cross-view geo-localization determines the location of a query image, captured by a drone or ground-based camera, by matching it to a geo-referenced satellite image. While traditional approaches focus on image-level localization, many applications, such as search-and-rescue, infrastructure inspection, and precision delivery, demand object-level accuracy. This enables users to prompt a specific object with a single click on a drone image to retrieve precise geo-tagged information of the object. However, variations in viewpoints, timing, and imaging conditions pose significant challenges, especially when identifying visually similar objects in extensive satellite imagery. To address these challenges, we propose an Object-level Cross-view Geo-localization Network (OCGNet). It integrates user-specified click locations using Gaussian Kernel Transfer (GKT) to preserve location information…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSoftmax · Attention Is All You Need · Focus
