SatMAE: Pre-training Transformers for Temporal and Multi-Spectral Satellite Imagery
Yezhen Cong, Samar Khanna, Chenlin Meng, Patrick Liu, Erik Rozi,, Yutong He, Marshall Burke, David B. Lobell, Stefano Ermon

TL;DR
SatMAE introduces a novel pre-training framework for satellite imagery that leverages temporal and multi-spectral data, significantly improving downstream task performance through masked autoencoding techniques.
Contribution
It develops a Masked Autoencoder-based pre-training method tailored for temporal and multi-spectral satellite imagery, incorporating temporal embeddings and spectral band encodings.
Findings
Up to 7% improvement in supervised benchmark performance.
Up to 14% enhancement in land cover classification accuracy.
Significant gains in semantic segmentation tasks.
Abstract
Unsupervised pre-training methods for large vision models have shown to enhance performance on downstream supervised tasks. Developing similar techniques for satellite imagery presents significant opportunities as unlabelled data is plentiful and the inherent temporal and multi-spectral structure provides avenues to further improve existing pre-training strategies. In this paper, we present SatMAE, a pre-training framework for temporal or multi-spectral satellite imagery based on Masked Autoencoder (MAE). To leverage temporal information, we include a temporal embedding along with independently masking image patches across time. In addition, we demonstrate that encoding multi-spectral data as groups of bands with distinct spectral positional encodings is beneficial. Our approach yields strong improvements over previous state-of-the-art techniques, both in terms of supervised learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsRemote-Sensing Image Classification · Automated Road and Building Extraction
