Co-visual pattern augmented generative transformer learning for   automobile geo-localization

Jianwei Zhao; Qiang Zhai; Pengbo Zhao; Rui Huang; Hong; Cheng

arXiv:2203.09135·cs.CV·April 21, 2023

Co-visual pattern augmented generative transformer learning for automobile geo-localization

Jianwei Zhao, Qiang Zhai, Pengbo Zhao, Rui Huang, Hong, Cheng

PDF

Open Access

TL;DR

This paper introduces a novel transformer-based method with co-visual pattern augmentation for cross-view geo-localization, significantly improving accuracy in matching aerial and ground images for vehicle localization.

Contribution

It proposes a mutual generative transformer learning framework that leverages cross-view knowledge generation and cascaded attention masking to enhance geo-localization accuracy.

Findings

01

Sets new state-of-the-art results on CVACT and CVUSA benchmarks.

02

Demonstrates the effectiveness of mutual generative transformers in cross-view matching.

03

Improves accuracy over existing Siamese-like architectures.

Abstract

Geolocation is a fundamental component of route planning and navigation for unmanned vehicles, but GNSS-based geolocation fails under denial-of-service conditions. Cross-view geo-localization (CVGL), which aims to estimate the geographical location of the ground-level camera by matching against enormous geo-tagged aerial (\emph{e.g.}, satellite) images, has received lots of attention but remains extremely challenging due to the drastic appearance differences across aerial-ground views. In existing methods, global representations of different views are extracted primarily using Siamese-like architectures, but their interactive benefits are seldom taken into account. In this paper, we present a novel approach using cross-view knowledge generative techniques in combination with transformers, namely mutual generative transformer learning (MGTL), for CVGL. Specifically, by taking the initial…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization · Multimodal Machine Learning Applications