Where We Are and What We're Looking At: Query Based Worldwide Image Geo-localization Using Hierarchies and Scenes
Brandon Clark, Alec Kerrigan, Parth Parag Kulkarni, Vicente Vivanco, Cepeda, Mubarak Shah

TL;DR
This paper introduces a transformer-based hierarchical approach for precise worldwide image geo-localization, leveraging scene and geographic cues, achieving state-of-the-art results on multiple datasets including a new challenging global street-level dataset.
Contribution
The paper proposes an end-to-end transformer architecture that models relationships between geographic hierarchies and scene types, improving geo-localization accuracy and generalization.
Findings
Achieved state-of-the-art accuracy on 4 standard datasets.
Demonstrated the effectiveness of hierarchical cross-attention.
Introduced a new challenging global street-level dataset.
Abstract
Determining the exact latitude and longitude that a photo was taken is a useful and widely applicable task, yet it remains exceptionally difficult despite the accelerated progress of other computer vision tasks. Most previous approaches have opted to learn a single representation of query images, which are then classified at different levels of geographic granularity. These approaches fail to exploit the different visual cues that give context to different hierarchies, such as the country, state, and city level. To this end, we introduce an end-to-end transformer-based architecture that exploits the relationship between different geographic levels (which we refer to as hierarchies) and the corresponding visual scene information in an image through hierarchical cross-attention. We achieve this by learning a query for each geographic hierarchy and scene type. Furthermore, we learn a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Remote-Sensing Image Classification
Methodsfail
