MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial   Representation Learning

Vishal Nedungadi; Ankit Kariryaa; Stefan Oehmcke; Serge Belongie,; Christian Igel; Nico Lang

arXiv:2405.02771·cs.CV·July 30, 2024·1 cites

MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning

Vishal Nedungadi, Ankit Kariryaa, Stefan Oehmcke, Serge Belongie,, Christian Igel, Nico Lang

PDF

Open Access 2 Repos

TL;DR

This paper introduces MMEarth, a large-scale multi-modal pretraining dataset for Earth observation data, and proposes MP-MAE, a multi-modal masked autoencoder approach that improves representation learning for satellite images across various tasks.

Contribution

The paper presents MMEarth, a novel large-scale multi-modal dataset, and MP-MAE, a new multi-modal pretraining method that enhances satellite image representations over existing approaches.

Findings

01

MP-MAE outperforms ImageNet and domain-specific MAEs on downstream tasks.

02

Multi-modal pretext tasks improve linear probing performance.

03

Pretraining enhances label and parameter efficiency for global applications.

Abstract

The volume of unlabelled Earth observation (EO) data is huge, but many important applications lack labelled training data. However, EO data offers the unique opportunity to pair data from different modalities and sensors automatically based on geographic location and time, at virtually no human labor cost. We seize this opportunity to create MMEarth, a diverse multi-modal pretraining dataset at global scale. Using this new corpus of 1.2 million locations, we propose a Multi-Pretext Masked Autoencoder (MP-MAE) approach to learn general-purpose representations for optical satellite images. Our approach builds on the ConvNeXt V2 architecture, a fully convolutional masked autoencoder (MAE). Drawing upon a suite of multi-modal pretext tasks, we demonstrate that our MP-MAE approach outperforms both MAEs pretrained on ImageNet and MAEs pretrained on domain-specific satellite images. This is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies · Natural Language Processing Techniques · Geographic Information Systems Studies

MethodsConvNeXt