GAReT: Cross-view Video Geolocalization with Adapters and   Auto-Regressive Transformers

Manu S Pillai; Mamshad Nayeem Rizve; Mubarak Shah

arXiv:2408.02840·cs.CV·August 7, 2024

GAReT: Cross-view Video Geolocalization with Adapters and Auto-Regressive Transformers

Manu S Pillai, Mamshad Nayeem Rizve, Mubarak Shah

PDF

Open Access 1 Repo

TL;DR

GAReT is a transformer-based approach for cross-view video geolocalization that does not rely on camera or odometry data, using novel modules to improve efficiency and temporal consistency, achieving state-of-the-art results.

Contribution

The paper introduces GAReT, a fully transformer-based CVGL method with GeoAdapter and TransRetriever modules, eliminating the need for camera and odometry data and enhancing temporal consistency.

Findings

01

Achieves state-of-the-art performance on benchmark datasets.

02

Does not require camera or odometry data, reducing reliance on additional sensors.

03

Improves temporal consistency of GPS trajectories.

Abstract

Cross-view video geo-localization (CVGL) aims to derive GPS trajectories from street-view videos by aligning them with aerial-view images. Despite their promising performance, current CVGL methods face significant challenges. These methods use camera and odometry data, typically absent in real-world scenarios. They utilize multiple adjacent frames and various encoders for feature extraction, resulting in high computational costs. Moreover, these approaches independently predict each street-view frame's location, resulting in temporally inconsistent GPS trajectories. To address these challenges, in this work, we propose GAReT, a fully transformer-based method for CVGL that does not require camera and odometry data. We introduce GeoAdapter, a transformer-adapter module designed to efficiently aggregate image-level representations and adapt them for video inputs. Specifically, we train a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

manupillai308/garet
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Video Analysis and Summarization · Advanced Vision and Imaging

MethodsGreedy Policy Search