Position Prediction Self-Supervised Learning for Multimodal Satellite Imagery Semantic Segmentation

John Waithaka; Moise Busogi

arXiv:2506.06852·cs.CV·July 17, 2025

Position Prediction Self-Supervised Learning for Multimodal Satellite Imagery Semantic Segmentation

John Waithaka, Moise Busogi

PDF

Open Access

TL;DR

This paper introduces a position prediction self-supervised learning method tailored for multimodal satellite imagery, improving semantic segmentation performance over traditional reconstruction-based methods by emphasizing spatial reasoning and cross-modal interaction.

Contribution

It adapts LOCA for satellite data by extending SatMAE's channel grouping and introducing same-group attention masking, focusing on spatial localization rather than reconstruction.

Findings

01

Outperforms existing self-supervised methods on Sen1Floods11 dataset.

02

Encourages cross-modal interaction during pretraining.

03

Enhances spatial reasoning for semantic segmentation.

Abstract

Semantic segmentation of satellite imagery is crucial for Earth observation applications, but remains constrained by limited labelled training data. While self-supervised pretraining methods like Masked Autoencoders (MAE) have shown promise, they focus on reconstruction rather than localisation-a fundamental aspect of segmentation tasks. We propose adapting LOCA (Location-aware), a position prediction self-supervised learning method, for multimodal satellite imagery semantic segmentation. Our approach addresses the unique challenges of satellite data by extending SatMAE's channel grouping from multispectral to multimodal data, enabling effective handling of multiple modalities, and introducing same-group attention masking to encourage cross-modal interaction during pretraining. The method uses relative patch position prediction, encouraging spatial reasoning for localisation rather than…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMethane Hydrates and Related Phenomena · Advanced Image and Video Retrieval Techniques · Geochemistry and Geologic Mapping