Remote Sensing Scene Classification with Masked Image Modeling (MIM)

Liya Wang; Alex Tien

arXiv:2302.14256·cs.CV·March 27, 2023·1 cites

Remote Sensing Scene Classification with Masked Image Modeling (MIM)

Liya Wang, Alex Tien

PDF

Open Access

TL;DR

This paper demonstrates that Masked Image Modeling (MIM) pretraining significantly improves remote sensing scene classification accuracy using Vision Transformers, outperforming supervised learning methods and rivaling specialized models.

Contribution

It is the first to systematically evaluate MIM pretraining for remote sensing scene classification, showing substantial performance gains over supervised learning and competitive results with specialized models.

Findings

01

MIM-pretrained ViTs outperform supervised counterparts by up to 5% accuracy.

02

MIM pretraining improves accuracy by up to 18% on top-1 metrics.

03

MIM-pretrained ViTs achieve performance comparable to specialized Transformer models.

Abstract

Remote sensing scene classification has been extensively studied for its critical roles in geological survey, oil exploration, traffic management, earthquake prediction, wildfire monitoring, and intelligence monitoring. In the past, the Machine Learning (ML) methods for performing the task mainly used the backbones pretrained in the manner of supervised learning (SL). As Masked Image Modeling (MIM), a self-supervised learning (SSL) technique, has been shown as a better way for learning visual feature representation, it presents a new opportunity for improving ML performance on the scene classification task. This research aims to explore the potential of MIM pretrained backbones on four well-known classification datasets: Merced, AID, NWPU-RESISC45, and Optimal-31. Compared to the published benchmarks, we show that the MIM pretrained Vision Transformer (ViTs) backbones outperform other…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRemote-Sensing Image Classification · Advanced Image and Video Retrieval Techniques · Remote Sensing and Land Use

MethodsAttention Is All You Need · Absolute Position Encodings · Label Smoothing · Softmax · Adam · Layer Normalization · Residual Connection · Dense Connections · Linear Layer · Dropout