Earth Science Foundation Models: From Perception to Reasoning and Discovery
Xiangyu Zhao, Bo Liu, Yuehan Zhang, Zelin Song, Wanghan Xu, Feng Liu, Fengxiang Wang, Ben Fei, Fenghua Ling, Wangxu Wei, Wenlong Zhang, Xiao-Ming Wu

TL;DR
This paper reviews the development and application of Earth science foundation models, highlighting their evolution from perception to reasoning, and covering diverse Earth system domains and multimodal data integration.
Contribution
It provides a comprehensive framework for understanding Earth foundation models' capabilities and applications, including a review of datasets, benchmarks, and future challenges.
Findings
Compiled over 200 datasets and benchmarks across Earth science tasks.
Reviewed models' evolution from perception to reasoning and agency.
Identified key challenges like data heterogeneity and model reliability.
Abstract
Large foundation models (FMs) are transforming Earth science by integrating heterogeneous multimodal data, such as multi-platform imagery, gridded reanalysis data, diverse geophysical and geochemical observations, and domain-specific text, to support tasks ranging from basic perception to advanced scientific discovery. This paper provides a unified review of Earth science foundation models (Earth FMs) through two complementary dimensions: depth, which traces the evolution of model capabilities from perception to multimodal reasoning and agentic scientific workflows, and breadth, which summarizes their expanding applications across the atmosphere, hydrosphere, lithosphere, biosphere, anthroposphere, and cryosphere, as well as coupled Earth system processes. Using this framework, we review representative multimodal Earth foundation models and compile more than 200 datasets and benchmarks…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
