GeoVLM: Improving Automated Vehicle Geolocalisation Using Vision-Language Matching

Barkin Dagda; Muhammad Awais; Saber Fallah

arXiv:2505.13669·cs.CV·May 21, 2025

GeoVLM: Improving Automated Vehicle Geolocalisation Using Vision-Language Matching

Barkin Dagda, Muhammad Awais, Saber Fallah

PDF

Open Access 1 Repo

TL;DR

GeoVLM leverages vision-language models and interpretable language descriptions to enhance cross-view geo-localisation accuracy, addressing scene similarity challenges and improving top match retrieval in automated vehicle positioning.

Contribution

Introduces GeoVLM, a novel trainable reranking method using zero-shot vision-language models and natural language descriptions for improved geo-localisation accuracy.

Findings

01

GeoVLM outperforms state-of-the-art methods on benchmark datasets.

02

It improves top match retrieval accuracy significantly.

03

The approach is effective in real-world driving environments.

Abstract

Cross-view geo-localisation identifies coarse geographical position of an automated vehicle by matching a ground-level image to a geo-tagged satellite image from a database. Despite the advancements in Cross-view geo-localisation, significant challenges still persist such as similar looking scenes which makes it challenging to find the correct match as the top match. Existing approaches reach high recall rates but they still fail to rank the correct image as the top match. To address this challenge, this paper proposes GeoVLM, a novel approach which uses the zero-shot capabilities of vision language models to enable cross-view geo-localisation using interpretable cross-view language descriptions. GeoVLM is a trainable reranking approach which improves the best match accuracy of cross-view geo-localisation. GeoVLM is evaluated on standard benchmark VIGOR and University-1652 and also…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cav-research-lab/geovlm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Image Retrieval and Classification Techniques