GeoMMBench and GeoMMAgent: Toward Expert-Level Multimodal Intelligence in Geoscience and Remote Sensing
Aoran Xiao, Shihao Cheng, Yonghao Xu, Yexian Ren, Hongruixuan Chen, Naoto Yokoya

TL;DR
This paper introduces GeoMMBench, a comprehensive benchmark for evaluating multimodal models in geoscience and remote sensing, and proposes GeoMMAgent, a multi-agent system that enhances model performance through specialized tools.
Contribution
The paper presents a new benchmark for domain-specific multimodal AI in geoscience and remote sensing, and introduces GeoMMAgent, a multi-agent framework that improves performance over standalone models.
Findings
GeoMMBench enables rigorous evaluation across diverse RS tasks.
GeoMMAgent outperforms standalone LLMs in geospatial tasks.
Tool-augmented agents are crucial for complex geoscience challenges.
Abstract
Recent advances in multimodal large language models (MLLMs) have accelerated progress in domain-oriented AI, yet their development in geoscience and remote sensing (RS) remains constrained by distinctive challenges: wide-ranging disciplinary knowledge, heterogeneous sensor modalities, and a fragmented spectrum of tasks. To bridge these gaps, we introduce GeoMMBench, a comprehensive multimodal question-answering benchmark covering diverse RS disciplines, sensors, and tasks, enabling broader and more rigorous evaluation than prior benchmarks. Using GeoMMBench, we assess 36 open-source and proprietary large language models, uncovering systematic deficiencies in domain knowledge, perceptual grounding, and reasoning--capabilities essential for expert-level geospatial interpretation. Beyond evaluation, we propose GeoMMAgent, a multi-agent framework that strategically integrates retrieval,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
