GeoMMBench and GeoMMAgent: Toward Expert-Level Multimodal Intelligence in Geoscience and Remote Sensing

Aoran Xiao; Shihao Cheng; Yonghao Xu; Yexian Ren; Hongruixuan Chen; Naoto Yokoya

arXiv:2604.08896·cs.CV·April 13, 2026

GeoMMBench and GeoMMAgent: Toward Expert-Level Multimodal Intelligence in Geoscience and Remote Sensing

Aoran Xiao, Shihao Cheng, Yonghao Xu, Yexian Ren, Hongruixuan Chen, Naoto Yokoya

PDF

TL;DR

This paper introduces GeoMMBench, a comprehensive benchmark for evaluating multimodal models in geoscience and remote sensing, and proposes GeoMMAgent, a multi-agent system that enhances model performance through specialized tools.

Contribution

The paper presents a new benchmark for domain-specific multimodal AI in geoscience and remote sensing, and introduces GeoMMAgent, a multi-agent framework that improves performance over standalone models.

Findings

01

GeoMMBench enables rigorous evaluation across diverse RS tasks.

02

GeoMMAgent outperforms standalone LLMs in geospatial tasks.

03

Tool-augmented agents are crucial for complex geoscience challenges.

Abstract

Recent advances in multimodal large language models (MLLMs) have accelerated progress in domain-oriented AI, yet their development in geoscience and remote sensing (RS) remains constrained by distinctive challenges: wide-ranging disciplinary knowledge, heterogeneous sensor modalities, and a fragmented spectrum of tasks. To bridge these gaps, we introduce GeoMMBench, a comprehensive multimodal question-answering benchmark covering diverse RS disciplines, sensors, and tasks, enabling broader and more rigorous evaluation than prior benchmarks. Using GeoMMBench, we assess 36 open-source and proprietary large language models, uncovering systematic deficiencies in domain knowledge, perceptual grounding, and reasoning--capabilities essential for expert-level geospatial interpretation. Beyond evaluation, we propose GeoMMAgent, a multi-agent framework that strategically integrates retrieval,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.