OlmoEarth: Stable Latent Image Modeling for Multimodal Earth Observation

Henry Herzog; Favyen Bastani; Yawen Zhang; Gabriel Tseng; Joseph Redmon; Hadrien Sablon; Ryan Park; Jacob Morrison; Alexandra Buraczynski; Karen Farley; Joshua Hansen; Andrew Howe; Patrick Alan Johnson; Mark Otterlee; Ted Schmitt; Hunter Pitelka; Stephen Daspit; Rachel Ratner; Christopher Wilhelm; Sebastian Wood; Mike Jacobi; Hannah Kerner; Evan Shelhamer; Ali Farhadi; Ranjay Krishna; Patrick Beukema

arXiv:2511.13655·cs.CV·November 18, 2025

OlmoEarth: Stable Latent Image Modeling for Multimodal Earth Observation

Henry Herzog, Favyen Bastani, Yawen Zhang, Gabriel Tseng, Joseph Redmon, Hadrien Sablon, Ryan Park, Jacob Morrison, Alexandra Buraczynski, Karen Farley, Joshua Hansen, Andrew Howe, Patrick Alan Johnson, Mark Otterlee, Ted Schmitt, Hunter Pitelka, Stephen Daspit, Rachel Ratner

PDF

Open Access

TL;DR

OlmoEarth is a novel multimodal, spatio-temporal foundation model for Earth observation data, employing a unique self-supervised learning approach that outperforms existing models on various benchmarks and real-world tasks.

Contribution

It introduces a new self-supervised learning formulation, masking strategy, and loss tailored for Earth observation data, achieving state-of-the-art performance.

Findings

01

Outperforms 12 other foundation models on research benchmarks

02

Achieves best performance on 15 out of 24 embedding evaluation tasks

03

Leads in 19 out of 29 tasks after full fine-tuning

Abstract

Earth observation data presents a unique challenge: it is spatial like images, sequential like video or text, and highly multimodal. We present OlmoEarth: a multimodal, spatio-temporal foundation model that employs a novel self-supervised learning formulation, masking strategy, and loss all designed for the Earth observation domain. OlmoEarth achieves state-of-the-art performance compared to 12 other foundation models across a variety of research benchmarks and real-world tasks from external partners. When evaluating embeddings OlmoEarth achieves the best performance on 15 out of 24 tasks, and with full fine-tuning it is the best on 19 of 29 tasks. We deploy OlmoEarth as the backbone of an end-to-end platform for data collection, labeling, training, and inference of Earth observation models. The OlmoEarth Platform puts frontier foundation models and powerful data management tools into…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Remote-Sensing Image Classification · Advanced Image and Video Retrieval Techniques