NAVIG: Natural Language-guided Analysis with Vision Language Models for   Image Geo-localization

Zheyuan Zhang; Runze Li; Tasnim Kabir; Jordan Boyd-Graber

arXiv:2502.14638·cs.CL·February 21, 2025

NAVIG: Natural Language-guided Analysis with Vision Language Models for Image Geo-localization

Zheyuan Zhang, Runze Li, Tasnim Kabir, Jordan Boyd-Graber

PDF

Open Access 1 Repo 1 Models 1 Datasets

TL;DR

NAVIG introduces a new dataset and a vision-language model that significantly improve image geo-localization accuracy by leveraging language-guided reasoning and requiring fewer training samples.

Contribution

The paper presents NaviClues, a high-quality dataset for geo-localization reasoning, and Navig, a novel framework that enhances accuracy using language-guided analysis with fewer samples.

Findings

01

Achieved 14% reduction in average distance error over previous models.

02

Created NaviClues dataset from GeoGuessr for expert reasoning examples.

03

Navig requires fewer than 1000 training samples to outperform state-of-the-art.

Abstract

Image geo-localization is the task of predicting the specific location of an image and requires complex reasoning across visual, geographical, and cultural contexts. While prior Vision Language Models (VLMs) have the best accuracy at this task, there is a dearth of high-quality datasets and models for analytical reasoning. We first create NaviClues, a high-quality dataset derived from GeoGuessr, a popular geography game, to supply examples of expert reasoning from language. Using this dataset, we present Navig, a comprehensive image geo-localization framework integrating global and fine-grained image information. By reasoning with language, Navig reduces the average distance error by 14% compared to previous state-of-the-art models while requiring fewer than 1000 training samples. Our dataset and code are available at https://github.com/SparrowZheyuan18/Navig/.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sparrowzheyuan18/navig
pytorchOfficial

Models

🤗
huggingCode11/NAVIG
model

Datasets

huggingCode11/NAVICLUES
dataset· 15 dl
15 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications · Geographic Information Systems Studies