GEO-Bench-2: From Performance to Capability, Rethinking Evaluation in Geospatial AI

Naomi Simumba; Nils Lehmann; Paolo Fraccaro; Hamed Alemohammad; Geeth De Mel; Salman Khan; Manil Maskey; Nicolas Longepe; Xiao Xiang Zhu; Hannah Kerner; Juan Bernabe-Moreno; Alexandre Lacoste

arXiv:2511.15658·cs.CV·February 3, 2026

GEO-Bench-2: From Performance to Capability, Rethinking Evaluation in Geospatial AI

Naomi Simumba, Nils Lehmann, Paolo Fraccaro, Hamed Alemohammad, Geeth De Mel, Salman Khan, Manil Maskey, Nicolas Longepe, Xiao Xiang Zhu, Hannah Kerner, Juan Bernabe-Moreno, Alexandre Lacoste

PDF

Open Access

TL;DR

GEO-Bench-2 introduces a comprehensive, standardized evaluation framework for Geospatial Foundation Models, enabling nuanced performance assessment across diverse tasks and datasets, and highlighting the importance of task-specific model selection.

Contribution

It provides a flexible, capability-based benchmarking protocol and a diverse dataset suite, facilitating fair comparison and research into model adaptation strategies for GeoFMs.

Findings

01

No single model dominates across all tasks

02

Pretraining on natural images benefits high-resolution tasks

03

EO-specific models excel in multispectral applications

Abstract

Geospatial Foundation Models (GeoFMs) are transforming Earth Observation (EO), but evaluation lacks standardized protocols. GEO-Bench-2 addresses this with a comprehensive framework spanning classification, segmentation, regression, object detection, and instance segmentation across 19 permissively-licensed datasets. We introduce ''capability'' groups to rank models on datasets that share common characteristics (e.g., resolution, bands, temporality). This enables users to identify which models excel in each capability and determine which areas need improvement in future work. To support both fair comparison and methodological innovation, we define a prescriptive yet flexible evaluation protocol. This not only ensures consistency in benchmarking but also facilitates research into model adaptation strategies, a key and open challenge in advancing GeoFMs for downstream tasks. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRemote-Sensing Image Classification · Geographic Information Systems Studies · Remote Sensing in Agriculture