How Does the Spatial Distribution of Pre-training Data Affect Geospatial Foundation Models?
Mirali Purohit, Gedeon Muhawenayo, Esther Rolf, Hannah Kerner

TL;DR
This paper investigates how the geographic distribution of pre-training data influences the performance of Geospatial Foundation Models, emphasizing the importance of diversity and global coverage for improved Earth observation tasks.
Contribution
It systematically evaluates the impact of different pre-training data distributions on GFMs, highlighting the benefits of balanced and globally representative data.
Findings
Balanced data often outperforms region-specific sampling.
Global coverage in pre-training data improves downstream task performance.
Optimal sampling techniques depend on GFM architecture.
Abstract
Foundation models have made rapid advances in many domains including Earth observation, where Geospatial Foundation Models (GFMs) can help address global challenges such as climate change, agriculture, and disaster response. Previous work on GFMs focused on tailoring model architecture and pre-text tasks, and did not investigate the impact of pre-training data selection on model performance. However, recent works from other domains show that the pre-training data distribution is an important factor influencing the performance of the foundation models. With this motivation, our research explores how the geographic distribution of pre-training data affects the performance of GFMs. We evaluated several pre-training data distributions by sampling different compositions from a global data pool. Our experiments with two GFMs on downstream tasks indicate that balanced and globally…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGeographic Information Systems Studies · Data Management and Algorithms
