OmniEarth: A Benchmark for Evaluating Vision-Language Models in Geospatial Tasks

Ronghao Fu; Haoran Liu; Weijie Zhang; Zhiwen Lin; Xiao Yang; Peng Zhang; Bo Yang

arXiv:2603.09471·cs.CV·March 11, 2026

OmniEarth: A Benchmark for Evaluating Vision-Language Models in Geospatial Tasks

Ronghao Fu, Haoran Liu, Weijie Zhang, Zhiwen Lin, Xiao Yang, Peng Zhang, Bo Yang

PDF

Open Access

TL;DR

OmniEarth introduces a comprehensive benchmark with 28 geospatial tasks for evaluating vision-language models in Earth observation, highlighting current models' limitations in complex remote sensing scenarios.

Contribution

This work presents OmniEarth, the first systematic benchmark for remote sensing vision-language models, covering perception, reasoning, and robustness in diverse geospatial tasks.

Findings

01

Existing VLMs struggle with geospatially complex tasks.

02

Benchmark reveals significant gaps in current models' capabilities.

03

OmniEarth provides a standardized platform for future model development.

Abstract

Vision-Language Models (VLMs) have demonstrated effective perception and reasoning capabilities on general-domain tasks, leading to growing interest in their application to Earth observation. However, a systematic benchmark for comprehensively evaluating remote sensing vision-language models (RSVLMs) remains lacking. To address this gap, we introduce OmniEarth, a benchmark for evaluating RSVLMs under realistic Earth observation scenarios. OmniEarth organizes tasks along three capability dimensions: perception, reasoning, and robustness. It defines 28 fine-grained tasks covering multi-source sensing data and diverse geospatial contexts. The benchmark supports two task formulations: multiple-choice VQA and open-ended VQA. The latter includes pure text outputs for captioning tasks, bounding box outputs for visual grounding tasks, and mask outputs for segmentation tasks. To reduce…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Remote-Sensing Image Classification · Advanced Neural Network Applications