Frequency-Aware Vision-Language Multimodality Generalization Network for Remote Sensing Image Classification
Junjie Zhang, Feng Zhao, Hanqiang Liu, Jun Yu

TL;DR
This paper introduces FVMGN, a frequency-aware multimodal network for remote sensing image classification that enhances cross-scene generalization by leveraging frequency domain features and a diffusion-based augmentation strategy.
Contribution
The work proposes a novel frequency-aware multimodal network with modules for frequency disentanglement, frequency-aware encoding, and multiscale feature alignment, specifically tailored for remote sensing data.
Findings
FVMGN outperforms state-of-the-art methods in multimodality generalization.
The frequency domain modules improve cross-scene robustness.
Diffusion-based augmentation enriches multimodal land-cover representations.
Abstract
The booming remote sensing (RS) technology is giving rise to a novel multimodality generalization task, which requires the model to overcome data heterogeneity while possessing powerful cross-scene generalization ability. Moreover, most vision-language models (VLMs) usually describe surface materials in RS images using universal texts, lacking proprietary linguistic prior knowledge specific to different RS vision modalities. In this work, we formalize RS multimodality generalization (RSMG) as a learning paradigm, and propose a frequency-aware vision-language multimodality generalization network (FVMGN) for RS image classification. Specifically, a diffusion-based training-test-time augmentation (DTAug) strategy is designed to reconstruct multimodal land-cover distributions, enriching input information for FVMGN. Following that, to overcome multimodal heterogeneity, a multimodal wavelet…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRemote-Sensing Image Classification · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
