Transformer-Guided Convolutional Neural Network for Cross-View   Geolocalization

Teng Wang; Shujuan Fan; Daikun Liu; Changyin Sun

arXiv:2204.09967·cs.CV·April 22, 2022·6 cites

Transformer-Guided Convolutional Neural Network for Cross-View Geolocalization

Teng Wang, Shujuan Fan, Daikun Liu, Changyin Sun

PDF

Open Access

TL;DR

This paper introduces TransGCNN, a novel Transformer-guided CNN architecture that improves cross-view geolocalization accuracy by combining local CNN features with global Transformer representations, achieving state-of-the-art results efficiently.

Contribution

The paper proposes a new Transformer-guided CNN model that enhances feature discrimination for ground-to-aerial geolocalization, with a dual-branch Transformer head for multi-scale global feature integration.

Findings

01

Achieves top-1 accuracy of 94.12% on CVUSA dataset.

02

Outperforms baselines with fewer parameters and higher frame rate.

03

Demonstrates superior accuracy-efficiency tradeoff.

Abstract

Ground-to-aerial geolocalization refers to localizing a ground-level query image by matching it to a reference database of geo-tagged aerial imagery. This is very challenging due to the huge perspective differences in visual appearances and geometric configurations between these two views. In this work, we propose a novel Transformer-guided convolutional neural network (TransGCNN) architecture, which couples CNN-based local features with Transformer-based global representations for enhanced representation learning. Specifically, our TransGCNN consists of a CNN backbone extracting feature map from an input image and a Transformer head modeling global context from the CNN map. In particular, our Transformer head acts as a spatial-aware importance generator to select salient CNN features as the final feature representation. Such a coupling procedure allows us to leverage a lightweight…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Robotics and Sensor-Based Localization · Advanced Neural Network Applications

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Layer Normalization · Dropout · Label Smoothing · Adam · Residual Connection · Absolute Position Encodings · Byte Pair Encoding