GAMa: Cross-view Video Geo-localization

Shruti Vyas; Chen Chen; and Mubarak Shah

arXiv:2207.02431·cs.CV·July 7, 2022·1 cites

GAMa: Cross-view Video Geo-localization

Shruti Vyas, Chen Chen, and Mubarak Shah

PDF

Open Access 1 Repo

TL;DR

This paper introduces GAMa, a new large-scale dataset of ground videos and aerial images for cross-view geo-localization, along with a novel hierarchical method that improves localization accuracy.

Contribution

The paper presents the first dataset for ground video to aerial image geo-localization and proposes a hierarchical approach to enhance clip-level localization accuracy.

Findings

01

Achieved 19.4% Top-1 recall rate for geo-localization.

02

Achieved 45.1% recall rate within 1 mile.

03

Demonstrated effectiveness of hierarchical approach.

Abstract

The existing work in cross-view geo-localization is based on images where a ground panorama is matched to an aerial image. In this work, we focus on ground videos instead of images which provides additional contextual cues which are important for this task. There are no existing datasets for this problem, therefore we propose GAMa dataset, a large-scale dataset with ground videos and corresponding aerial images. We also propose a novel approach to solve this problem. At clip-level, a short video clip is matched with corresponding aerial image and is later used to get video-level geo-localization of a long video. Moreover, we propose a hierarchical approach to further improve the clip-level geolocalization. It is a challenging dataset, unaligned and limited field of view, and our proposed method achieves a Top-1 recall rate of 19.4% and 45.1% @1.0mile. Code and dataset are available at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

svyas23/gama
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Robotics and Sensor-Based Localization

MethodsContrastive Language-Image Pre-training