Efficient Remote Sensing with Harmonized Transfer Learning and Modality Alignment
Tengjun Huang

TL;DR
This paper introduces HarMA, a novel method for remote sensing that enhances transfer learning and modality alignment, achieving state-of-the-art results with minimal training overhead and broad applicability.
Contribution
HarMA is a new approach that simultaneously addresses task constraints, modality alignment, and uniformity, improving multimodal transfer learning efficiency in remote sensing.
Findings
HarMA achieves state-of-the-art performance in remote sensing retrieval tasks.
HarMA outperforms fully fine-tuned models with fewer parameters.
HarMA is compatible with existing multimodal pretraining models.
Abstract
With the rise of Visual and Language Pretraining (VLP), an increasing number of downstream tasks are adopting the paradigm of pretraining followed by fine-tuning. Although this paradigm has demonstrated potential in various multimodal downstream tasks, its implementation in the remote sensing domain encounters some obstacles. Specifically, the tendency for same-modality embeddings to cluster together impedes efficient transfer learning. To tackle this issue, we review the aim of multimodal transfer learning for downstream tasks from a unified perspective, and rethink the optimization process based on three distinct objectives. We propose "Harmonized Transfer Learning and Modality Alignment (HarMA)", a method that simultaneously satisfies task constraints, modality alignment, and single-modality uniform alignment, while minimizing training overhead through parameter-efficient…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRemote-Sensing Image Classification · Remote Sensing and Land Use
