Cross-view Geo-localization with Evolving Transformer

Hongji Yang; Xiufan Lu; Yingying Zhu

arXiv:2107.00842·cs.CV·July 6, 2021·6 cites

Cross-view Geo-localization with Evolving Transformer

Hongji Yang, Xiufan Lu, Yingying Zhu

PDF

Open Access

TL;DR

This paper introduces EgoTR, a Transformer-based model for cross-view geo-localization that leverages self-attention and a novel self-cross attention mechanism to improve global dependency modeling and geometric understanding, outperforming existing CNN-based methods.

Contribution

The paper presents a new evolving geo-localization Transformer with self-cross attention, enhancing global dependency modeling and geometric correspondence in cross-view geo-localization tasks.

Findings

01

EgoTR outperforms state-of-the-art methods on multiple datasets.

02

Self-cross attention improves training stability and generalization.

03

Transformer-based approach reduces reliance on strong geometric assumptions.

Abstract

In this work, we address the problem of cross-view geo-localization, which estimates the geospatial location of a street view image by matching it with a database of geo-tagged aerial images. The cross-view matching task is extremely challenging due to drastic appearance and geometry differences across views. Unlike existing methods that predominantly fall back on CNN, here we devise a novel evolving geo-localization Transformer (EgoTR) that utilizes the properties of self-attention in Transformer to model global dependencies, thus significantly decreasing visual ambiguities in cross-view geo-localization. We also exploit the positional encoding of Transformer to help the EgoTR understand and correspond geometric configurations between ground and aerial images. Compared to state-of-the-art methods that impose strong assumption on geometry knowledge, the EgoTR flexibly learns the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Video Surveillance and Tracking Methods · Advanced Image and Video Retrieval Techniques

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Adam · Layer Normalization · Byte Pair Encoding · Dropout · Label Smoothing