MAESTRO: Masked AutoEncoders for Multimodal, Multitemporal, and Multispectral Earth Observation Data
Antoine Labatie, Michael Vaccaro, Nina Lardiere, Anatol Garioud, Nicolas Gonthier

TL;DR
MAESTRO is a novel self-supervised learning method for Earth observation data that effectively captures multimodal, multitemporal, and multispectral information, achieving state-of-the-art results across multiple datasets.
Contribution
It introduces MAESTRO, a new Masked Autoencoder adaptation with optimized fusion and normalization schemes tailored for Earth observation data.
Findings
Achieves state-of-the-art performance on multitemporal tasks
Remains competitive on other Earth observation tasks
Demonstrates effectiveness across multiple datasets
Abstract
Self-supervised learning holds great promise for remote sensing, but standard self-supervised methods must be adapted to the unique characteristics of Earth observation data. We take a step in this direction by conducting a comprehensive benchmark of fusion strategies and normalization schemes of reconstruction targets for multimodal, multitemporal, and multispectral Earth observation data. Based on our findings, we introduce MAESTRO, a novel adaptation of the Masked Autoencoder with optimized fusion mechanisms and a normalization scheme that incorporates a spectral prior as a self-supervisory signal. Evaluated on four Earth observation datasets in both intra- and cross-dataset settings, MAESTRO achieves state-of-the-art performance on tasks that strongly rely on multitemporal dynamics, while also remaining competitive on others. Code to reproduce all our experiments is available at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
