SkySense: A Multi-Modal Remote Sensing Foundation Model Towards Universal Interpretation for Earth Observation Imagery
Xin Guo, Jiangwei Lao, Bo Dang, Yingying Zhang, Lei Yu, Lixiang Ru,, Liheng Zhong, Ziyuan Huang, Kang Wu, Dingxiang Hu, Huimei He, Jian Wang,, Jingdong Chen, Ming Yang, Yongjun Zhang, Yansheng Li

TL;DR
SkySense is a large multi-modal remote sensing foundation model that integrates temporal, optical, and SAR data, significantly improving performance across diverse Earth Observation tasks through novel multi-modal and geo-context learning techniques.
Contribution
The paper introduces SkySense, the largest multi-modal remote sensing foundation model with innovative multi-granularity contrastive and geo-context prototype learning methods.
Findings
Outperforms 18 recent RSFMs across 16 datasets and 7 tasks.
Achieves 2.76%, 3.67%, and 3.61% improvements over GFM, SatLas, and Scale-MAE.
Demonstrates strong generalization and flexibility for various remote sensing applications.
Abstract
Prior studies on Remote Sensing Foundation Model (RSFM) reveal immense potential towards a generic model for Earth Observation. Nevertheless, these works primarily focus on a single modality without temporal and geo-context modeling, hampering their capabilities for diverse tasks. In this study, we present SkySense, a generic billion-scale model, pre-trained on a curated multi-modal Remote Sensing Imagery (RSI) dataset with 21.5 million temporal sequences. SkySense incorporates a factorized multi-modal spatiotemporal encoder taking temporal sequences of optical and Synthetic Aperture Radar (SAR) data as input. This encoder is pre-trained by our proposed Multi-Granularity Contrastive Learning to learn representations across different modal and spatial granularities. To further enhance the RSI representations by the geo-context clue, we introduce Geo-Context Prototype Learning to learn…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRemote-Sensing Image Classification · Data-Driven Disease Surveillance · Geographic Information Systems Studies
MethodsContrastive Learning · Focus
