RingMo-Aerial: An Aerial Remote Sensing Foundation Model With Affine Transformation Contrastive Learning
Wenhui Diao, Haichen Yu, Kaiyue Kang, Tong Ling, Di Liu, Yingchao Feng, Hanbo Bi, Libo Ren, Xuexue Li, Yongqiang Mao, Xian Sun

TL;DR
RingMo-Aerial is a new foundation model for aerial remote sensing that uses affine transformation contrastive learning and a specialized attention mechanism to improve performance across diverse ARS vision tasks.
Contribution
It introduces a novel foundation model with a frequency-enhanced attention mechanism and affine transformation contrastive learning for better ARS task adaptability.
Findings
Achieves state-of-the-art results on multiple ARS tasks.
Enhances small-object detection capabilities.
Improves model adaptability to tilted viewing angles.
Abstract
Aerial Remote Sensing (ARS) vision tasks present significant challenges due to the unique viewing angle characteristics. Existing research has primarily focused on algorithms for specific tasks, which have limited applicability in a broad range of ARS vision applications. This paper proposes RingMo-Aerial, aiming to fill the gap in foundation model research in the field of ARS vision. A Frequency-Enhanced Multi-Head Self-Attention (FE-MSA) mechanism is introduced to strengthen the model's capacity for small-object representation. Complementarily, an affine transformation-based contrastive learning method improves its adaptability to the tilted viewing angles inherent in ARS tasks. Furthermore, the ARS-Adapter, an efficient parameter fine-tuning method, is proposed to improve the model's adaptability and performance in various ARS vision tasks. Experimental results demonstrate that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTarget Tracking and Data Fusion in Sensor Networks · Gaussian Processes and Bayesian Inference · Neural Networks and Applications
MethodsContrastive Learning
