Cross-Scale Pretraining: Enhancing Self-Supervised Learning for Low-Resolution Satellite Imagery for Semantic Segmentation

John Waithaka; Gustave Bwirayesu; Moise Busogi

arXiv:2601.12964·cs.CV·May 5, 2026

Cross-Scale Pretraining: Enhancing Self-Supervised Learning for Low-Resolution Satellite Imagery for Semantic Segmentation

John Waithaka, Gustave Bwirayesu, Moise Busogi

PDF

TL;DR

This paper introduces a spatial affinity component for self-supervised pretraining that leverages high-resolution satellite imagery to improve low-resolution image representations and segmentation performance.

Contribution

The authors propose a novel spatial affinity component that enhances self-supervised learning by incorporating high-resolution data, improving low-resolution satellite image segmentation.

Findings

01

Spatial affinity component outperforms models pretrained on HR or MR images alone.

02

Inclusion of HR imagery improves MR image representation learning.

03

The method enhances downstream segmentation performance.

Abstract

Self-supervised pretraining in remote sensing is mostly done using mid-spatial resolution (MR) image datasets due to their high availability. Given the release of high-resolution (HR) datasets, we ask how HR datasets can be included in self-supervised pretraining to enhance MR image representation learning and downstream segmentation performance on MR tasks. We design a spatial affinity component that can be added to existing self-supervised learning frameworks and that uses HR imagery to learn better representations of MR imagery. We test the spatial affinity component on two self-supervised learning frameworks and show that it outperforms models pretrained on HR or MR images alone.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.