SkySense V2: A Unified Foundation Model for Multi-modal Remote Sensing
Yingying Zhang, Lixiang Ru, Kang Wu, Lei Yu, Lei Liang, Yansheng Li, Jingdong Chen

TL;DR
SkySense V2 introduces a unified multi-modal remote sensing foundation model using a single transformer backbone, tailored SSL pre-training, and innovative modules to improve efficiency and performance across diverse Earth observation tasks.
Contribution
It presents a novel unified framework with a tailored SSL strategy, adaptive modules, and MoE integration for multi-modal remote sensing data, reducing redundancy and enhancing generalization.
Findings
Outperforms previous models by 1.8 points on average across 16 datasets.
Demonstrates strong generalization over multiple remote sensing tasks.
Effectively handles varying resolutions and limited feature diversity.
Abstract
The multi-modal remote sensing foundation model (MM-RSFM) has significantly advanced various Earth observation tasks, such as urban planning, environmental monitoring, and natural disaster management. However, most existing approaches generally require the training of separate backbone networks for each data modality, leading to redundancy and inefficient parameter utilization. Moreover, prevalent pre-training methods typically apply self-supervised learning (SSL) techniques from natural images without adequately accommodating the characteristics of remote sensing (RS) images, such as the complicated semantic distribution within a single RS image. In this work, we present SkySense V2, a unified MM-RSFM that employs a single transformer backbone to handle multiple modalities. This backbone is pre-trained with a novel SSL strategy tailored to the distinct traits of RS data. In particular,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRemote-Sensing Image Classification · Advanced Neural Network Applications · Automated Road and Building Extraction
