SatSwinMAE: Efficient Autoencoding for Multiscale Time-series Satellite   Imagery

Yohei Nakayama; Jiawei Su; Luis M. Pazos-Out\'on

arXiv:2405.02512·cs.CV·October 21, 2024·2 cites

SatSwinMAE: Efficient Autoencoding for Multiscale Time-series Satellite Imagery

Yohei Nakayama, Jiawei Su, Luis M. Pazos-Out\'on

PDF

Open Access

TL;DR

SatSwinMAE introduces an efficient hierarchical autoencoder with temporal modeling for satellite time-series imagery, significantly improving performance on various geospatial tasks by capturing multi-scale spatio-temporal dependencies.

Contribution

It extends SwinMAE with temporal integration using Video Swin Transformer blocks, enhancing transfer learning and achieving state-of-the-art results in satellite image analysis.

Findings

01

Outperforms existing models in land cover segmentation by 10.4% accuracy

02

Achieves significant improvements in flood and wildfire mapping tasks

03

Effectively captures multi-scale spatio-temporal features in satellite data

Abstract

Recent advancements in foundation models have significantly impacted various fields, including natural language processing, computer vision, and multi-modal tasks. One area that stands to benefit greatly is Earth observation, where these models can efficiently process large-scale, unlabeled geospatial data. In this work we extend the SwinMAE model to integrate temporal information for satellite time-series data. The architecture employs a hierarchical 3D Masked Autoencoder (MAE) with Video Swin Transformer blocks to effectively capture multi-scale spatio-temporal dependencies in satellite imagery. To enhance transfer learning, we incorporate both encoder and decoder pretrained weights, along with skip connections to preserve scale-specific information. This forms an architecture similar to SwinUNet with an additional temporal component. Our approach shows significant performance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques · Remote-Sensing Image Classification · Advanced Computational Techniques and Applications

MethodsAttention Is All You Need · Sparse Evolutionary Training · Dropout · Label Smoothing · Residual Connection · Softmax · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Absolute Position Encodings · Linear Layer