MAESTRO: Masked AutoEncoders for Multimodal, Multitemporal, and Multispectral Earth Observation Data

Antoine Labatie; Michael Vaccaro; Nina Lardiere; Anatol Garioud; Nicolas Gonthier

arXiv:2508.10894·cs.CV·October 10, 2025

MAESTRO: Masked AutoEncoders for Multimodal, Multitemporal, and Multispectral Earth Observation Data

Antoine Labatie, Michael Vaccaro, Nina Lardiere, Anatol Garioud, Nicolas Gonthier

PDF

2 Models 5 Datasets

TL;DR

MAESTRO is a novel self-supervised learning method for Earth observation data that effectively captures multimodal, multitemporal, and multispectral information, achieving state-of-the-art results across multiple datasets.

Contribution

It introduces MAESTRO, a new Masked Autoencoder adaptation with optimized fusion and normalization schemes tailored for Earth observation data.

Findings

01

Achieves state-of-the-art performance on multitemporal tasks

02

Remains competitive on other Earth observation tasks

03

Demonstrates effectiveness across multiple datasets

Abstract

Self-supervised learning holds great promise for remote sensing, but standard self-supervised methods must be adapted to the unique characteristics of Earth observation data. We take a step in this direction by conducting a comprehensive benchmark of fusion strategies and normalization schemes of reconstruction targets for multimodal, multitemporal, and multispectral Earth observation data. Based on our findings, we introduce MAESTRO, a novel adaptation of the Masked Autoencoder with optimized fusion mechanisms and a normalization scheme that incorporates a spectral prior as a self-supervisory signal. Evaluated on four Earth observation datasets in both intra- and cross-dataset settings, MAESTRO achieves state-of-the-art performance on tasks that strongly rely on multitemporal dynamics, while also remaining competitive on others. Code to reproduce all our experiments is available at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.