Co-visual pattern augmented generative transformer learning for automobile geo-localization
Jianwei Zhao, Qiang Zhai, Pengbo Zhao, Rui Huang, Hong, Cheng

TL;DR
This paper introduces a novel transformer-based method with co-visual pattern augmentation for cross-view geo-localization, significantly improving accuracy in matching aerial and ground images for vehicle localization.
Contribution
It proposes a mutual generative transformer learning framework that leverages cross-view knowledge generation and cascaded attention masking to enhance geo-localization accuracy.
Findings
Sets new state-of-the-art results on CVACT and CVUSA benchmarks.
Demonstrates the effectiveness of mutual generative transformers in cross-view matching.
Improves accuracy over existing Siamese-like architectures.
Abstract
Geolocation is a fundamental component of route planning and navigation for unmanned vehicles, but GNSS-based geolocation fails under denial-of-service conditions. Cross-view geo-localization (CVGL), which aims to estimate the geographical location of the ground-level camera by matching against enormous geo-tagged aerial (\emph{e.g.}, satellite) images, has received lots of attention but remains extremely challenging due to the drastic appearance differences across aerial-ground views. In existing methods, global representations of different views are extracted primarily using Siamese-like architectures, but their interactive benefits are seldom taken into account. In this paper, we present a novel approach using cross-view knowledge generative techniques in combination with transformers, namely mutual generative transformer learning (MGTL), for CVGL. Specifically, by taking the initial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization · Multimodal Machine Learning Applications
