Turning Generators into Retrievers: Unlocking MLLMs for Natural Language-Guided Geo-Localization

Yuqi Chen; Xiaohan Zhang; Ahmad Arrabi; Waqas Sultani; Chen Chen; Safwan Wshah

arXiv:2604.10721·cs.CV·April 14, 2026

Turning Generators into Retrievers: Unlocking MLLMs for Natural Language-Guided Geo-Localization

Yuqi Chen, Xiaohan Zhang, Ahmad Arrabi, Waqas Sultani, Chen Chen, Safwan Wshah

PDF

1 Repo

TL;DR

This paper adapts Multimodal Large Language Models for natural-language guided geo-localization, achieving state-of-the-art results with a simple, parameter-efficient fine-tuning approach that outperforms traditional methods.

Contribution

It introduces a novel, effective framework for fine-tuning MLLMs for NGCG, enabling strong cross-modal alignment without complex architectural changes.

Findings

01

Achieved 12.2% improvement in Text-to-Image Recall@1 on GeoText-1652.

02

Secured top performance in 5 out of 12 subtasks on CVG-Text.

03

Surpassed baseline methods with fewer trainable parameters.

Abstract

Natural-language Guided Cross-view Geo-localization (NGCG) aims to retrieve geo-tagged satellite imagery using textual descriptions of ground scenes. While recent NGCG methods commonly rely on CLIP-style dual-encoder architectures, they often suffer from weak cross-modal generalization and require complex architectural designs. In contrast, Multimodal Large Language Models (MLLMs) offer powerful semantic reasoning capabilities but are not directly optimized for retrieval tasks. In this work, we present a simple yet effective framework to adapt MLLMs for NGCG via parameter-efficient finetuning. Our approach optimizes latent representations within the MLLM while preserving its pretrained multimodal knowledge, enabling strong cross-modal alignment without redesigning model architectures. Through systematic analysis of diverse variables, from model backbone to feature aggregation, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://yuqichen888.github.io/NGCG-MLLMs-web
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.