Remote Sensing Large Vision-Language Model: Semantic-augmented Multi-level Alignment and Semantic-aware Expert Modeling

Sungjune Park; Yeongyun Kim; Se Yeon Kim; and Yong Man Ro

arXiv:2506.21863·cs.CV·June 30, 2025

Remote Sensing Large Vision-Language Model: Semantic-augmented Multi-level Alignment and Semantic-aware Expert Modeling

Sungjune Park, Yeongyun Kim, Se Yeon Kim, and Yong Man Ro

PDF

Open Access

TL;DR

This paper introduces a specialized large vision-language model for remote sensing that uses multi-level semantic alignment and expert modeling to improve scene understanding and task performance.

Contribution

It proposes a novel framework with semantic augmentation and expert modeling tailored for remote sensing, addressing domain differences from natural images.

Findings

01

Achieves consistent improvements on remote sensing tasks.

02

Effectively bridges the gap between general LVLMs and RS-specific understanding.

03

Enhances multi-level semantic understanding in RS imagery.

Abstract

Large Vision and Language Models (LVLMs) have shown strong performance across various vision-language tasks in natural image domains. However, their application to remote sensing (RS) remains underexplored due to significant domain differences in visual appearances, object scales, and semantics. These discrepancies hider the effective understanding of RS scenes, which contain rich, multi-level semantic information spanning from coarse-to-fine levels. Hence, it limits the direct adaptation of existing LVLMs to RS imagery. To address this gap, we propose a novel LVLM framework tailored for RS understanding, incorporating two core components: Semantic-augmented Multi-level Alignment and Semantic-aware Expert Modeling. First, to align multi-level visual features, we introduce the retrieval-based Semantic Augmentation Module which enriches the visual features with relevant semantics across…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Remote-Sensing Image Classification · Domain Adaptation and Few-Shot Learning