How Effective is Pre-training of Large Masked Autoencoders for   Downstream Earth Observation Tasks?

Jose Sosa; Mohamed Aloulou; Danila Rukhovich; Rim Sleimi; Boonyarit; Changaival; Anis Kacem; and Djamila Aouada

arXiv:2409.18536·cs.CV·September 30, 2024

How Effective is Pre-training of Large Masked Autoencoders for Downstream Earth Observation Tasks?

Jose Sosa, Mohamed Aloulou, Danila Rukhovich, Rim Sleimi, Boonyarit, Changaival, Anis Kacem, and Djamila Aouada

PDF

Open Access

TL;DR

This paper evaluates the effectiveness of pre-training large Masked Autoencoders for Earth Observation tasks, finding that pre-training benefits are task-dependent and sometimes comparable to training from scratch.

Contribution

It provides a comprehensive analysis of ViT-based MAE pre-training effects on various EO downstream tasks, highlighting when pre-training is advantageous.

Findings

01

Pre-training benefits are significant for reconstruction tasks.

02

For segmentation and classification, training from scratch can be equally effective.

03

Pre-training advantages diminish when downstream tasks differ from pre-training objectives.

Abstract

Self-supervised pre-training has proven highly effective for many computer vision tasks, particularly when labelled data are scarce. In the context of Earth Observation (EO), foundation models and various other Vision Transformer (ViT)-based approaches have been successfully applied for transfer learning to downstream tasks. However, it remains unclear under which conditions pre-trained models offer significant advantages over training from scratch. In this study, we investigate the effectiveness of pre-training ViT-based Masked Autoencoders (MAE) for downstream EO tasks, focusing on reconstruction, segmentation, and classification. We consider two large ViT-based MAE pre-trained models: a foundation model (Prithvi) and SatMAE. We evaluate Prithvi on reconstruction and segmentation-based downstream tasks, and for SatMAE we assess its performance on a classification downstream task. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Processing and 3D Reconstruction · Seismology and Earthquake Studies · Geological Modeling and Analysis

MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Layer Normalization · Dense Connections · Adam · Residual Connection · Position-Wise Feed-Forward Layer · Label Smoothing · Byte Pair Encoding