OmniEarth: A Benchmark for Evaluating Vision-Language Models in Geospatial Tasks
Ronghao Fu, Haoran Liu, Weijie Zhang, Zhiwen Lin, Xiao Yang, Peng Zhang, Bo Yang

TL;DR
OmniEarth introduces a comprehensive benchmark with 28 geospatial tasks for evaluating vision-language models in Earth observation, highlighting current models' limitations in complex remote sensing scenarios.
Contribution
This work presents OmniEarth, the first systematic benchmark for remote sensing vision-language models, covering perception, reasoning, and robustness in diverse geospatial tasks.
Findings
Existing VLMs struggle with geospatially complex tasks.
Benchmark reveals significant gaps in current models' capabilities.
OmniEarth provides a standardized platform for future model development.
Abstract
Vision-Language Models (VLMs) have demonstrated effective perception and reasoning capabilities on general-domain tasks, leading to growing interest in their application to Earth observation. However, a systematic benchmark for comprehensively evaluating remote sensing vision-language models (RSVLMs) remains lacking. To address this gap, we introduce OmniEarth, a benchmark for evaluating RSVLMs under realistic Earth observation scenarios. OmniEarth organizes tasks along three capability dimensions: perception, reasoning, and robustness. It defines 28 fine-grained tasks covering multi-source sensing data and diverse geospatial contexts. The benchmark supports two task formulations: multiple-choice VQA and open-ended VQA. The latter includes pure text outputs for captioning tasks, bounding box outputs for visual grounding tasks, and mask outputs for segmentation tasks. To reduce…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Remote-Sensing Image Classification · Advanced Neural Network Applications
