MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning
Vishal Nedungadi, Ankit Kariryaa, Stefan Oehmcke, Serge Belongie,, Christian Igel, Nico Lang

TL;DR
This paper introduces MMEarth, a large-scale multi-modal pretraining dataset for Earth observation data, and proposes MP-MAE, a multi-modal masked autoencoder approach that improves representation learning for satellite images across various tasks.
Contribution
The paper presents MMEarth, a novel large-scale multi-modal dataset, and MP-MAE, a new multi-modal pretraining method that enhances satellite image representations over existing approaches.
Findings
MP-MAE outperforms ImageNet and domain-specific MAEs on downstream tasks.
Multi-modal pretext tasks improve linear probing performance.
Pretraining enhances label and parameter efficiency for global applications.
Abstract
The volume of unlabelled Earth observation (EO) data is huge, but many important applications lack labelled training data. However, EO data offers the unique opportunity to pair data from different modalities and sensors automatically based on geographic location and time, at virtually no human labor cost. We seize this opportunity to create MMEarth, a diverse multi-modal pretraining dataset at global scale. Using this new corpus of 1.2 million locations, we propose a Multi-Pretext Masked Autoencoder (MP-MAE) approach to learn general-purpose representations for optical satellite images. Our approach builds on the ConvNeXt V2 architecture, a fully convolutional masked autoencoder (MAE). Drawing upon a suite of multi-modal pretext tasks, we demonstrate that our MP-MAE approach outperforms both MAEs pretrained on ImageNet and MAEs pretrained on domain-specific satellite images. This is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Natural Language Processing Techniques · Geographic Information Systems Studies
MethodsConvNeXt
