MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining
Di Wang, Jing Zhang, Minqiang Xu, Lin Liu, Dongsheng Wang, Erzhong, Gao, Chengxi Han, Haonan Guo, Bo Du, Dacheng Tao, Liangpei Zhang

TL;DR
This paper introduces a Multi-Task Pretraining approach for remote sensing foundation models, improving transferability across diverse RS tasks by training on multiple objectives simultaneously.
Contribution
It proposes a novel multi-task pretraining paradigm with shared encoder and task-specific decoders, applicable to CNNs and vision transformers, enhancing downstream RS task performance.
Findings
Outperforms existing models of similar size on 14 datasets.
Achieves competitive results with larger state-of-the-art models.
Validates effectiveness of multi-task pretraining in RS domain.
Abstract
Foundation models have reshaped the landscape of Remote Sensing (RS) by enhancing various image interpretation tasks. Pretraining is an active research topic, encompassing supervised and self-supervised learning methods to initialize model weights effectively. However, transferring the pretrained models to downstream tasks may encounter task discrepancy due to their formulation of pretraining as image classification or object discrimination tasks. In this study, we explore the Multi-Task Pretraining (MTP) paradigm for RS foundation models to address this issue. Using a shared encoder and task-specific decoder architecture, we conduct multi-task supervised pretraining on the SAMRS dataset, encompassing semantic segmentation, instance segmentation, and rotated object detection. MTP supports both convolutional neural networks and vision transformer foundation models with over 300 million…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGeographic Information Systems Studies · Constraint Satisfaction and Optimization · Distributed and Parallel Computing Systems
MethodsAttention Is All You Need · Linear Layer · Dense Connections · Softmax · Layer Normalization · Multi-Head Attention · Residual Connection · Vision Transformer
