In-Domain Self-Supervised Learning Improves Remote Sensing Image Scene Classification
Ivica Dimitrovski, Ivan Kitanovski, Nikola Simidjievski, Dragi Kocev

TL;DR
This paper demonstrates that in-domain self-supervised pre-training using large remote sensing datasets significantly enhances the accuracy of remote sensing image scene classification compared to traditional methods.
Contribution
The study shows that employing in-domain SSL pre-training with Vision transformers on large remote sensing datasets improves downstream classification performance.
Findings
In-domain SSL pre-training outperforms generic pre-training methods.
Using the iBOT framework with Million-AID dataset yields better results.
Performance gains are consistent across 14 diverse datasets.
Abstract
We investigate the utility of in-domain self-supervised pre-training of vision models in the analysis of remote sensing imagery. Self-supervised learning (SSL) has emerged as a promising approach for remote sensing image classification due to its ability to exploit large amounts of unlabeled data. Unlike traditional supervised learning, SSL aims to learn representations of data without the need for explicit labels. This is achieved by formulating auxiliary tasks that can be used for pre-training models before fine-tuning them on a given downstream task. A common approach in practice to SSL pre-training is utilizing standard pre-training datasets, such as ImageNet. While relevant, such a general approach can have a sub-optimal influence on the downstream performance of models, especially on tasks from challenging domains such as remote sensing. In this paper, we analyze the effectiveness…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRemote-Sensing Image Classification · Domain Adaptation and Few-Shot Learning
