GeoChain: Multimodal Chain-of-Thought for Geographic Reasoning

Sahiti Yerramilli; Nilay Pande; Rynaa Grover; Jayant Sravan Tamarapalli

arXiv:2506.00785·cs.AI·September 10, 2025

GeoChain: Multimodal Chain-of-Thought for Geographic Reasoning

Sahiti Yerramilli, Nilay Pande, Rynaa Grover, Jayant Sravan Tamarapalli

PDF

Open Access 1 Datasets 1 Video

TL;DR

GeoChain is a comprehensive benchmark dataset designed to evaluate and improve step-by-step geographic reasoning in multimodal large language models, highlighting current challenges and guiding future advancements.

Contribution

The paper introduces GeoChain, a large-scale multimodal benchmark with detailed reasoning chains and annotations for geographic reasoning tasks in large language models.

Findings

01

Contemporary MLLMs struggle with visual grounding and accurate localization.

02

Models exhibit erratic reasoning as task complexity increases.

03

GeoChain provides a diagnostic tool for advancing geographic reasoning in MLLMs.

Abstract

This paper introduces GeoChain, a large-scale benchmark for evaluating step-by-step geographic reasoning in multimodal large language models (MLLMs). Leveraging 1.46 million Mapillary street-level images, GeoChain pairs each image with a 21-step chain-of-thought (CoT) question sequence (over 30 million Q&A pairs). These sequences guide models from coarse attributes to fine-grained localization across four reasoning categories - visual, spatial, cultural, and precise geolocation - annotated by difficulty. Images are also enriched with semantic segmentation (150 classes) and a visual locatability score. Our benchmarking of contemporary MLLMs (GPT-4.1 variants, Claude 3.7, Gemini 2.5 variants) on a diverse 2,088-image subset reveals consistent challenges: models frequently exhibit weaknesses in visual grounding, display erratic reasoning, and struggle to achieve accurate localization,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

sahitiy51/geochain
dataset· 67 dl
67 dl

Videos

GeoChain: Multimodal Chain-of-Thought for Geographic Reasoning· underline

Taxonomy

TopicsSemantic Web and Ontologies · Geographic Information Systems Studies · Service-Oriented Architecture and Web Services