Hierarchical Fusion of Local and Global Visual Features with Mixture-of-Experts for Remote Sensing Image Scene Classification
Yuanhao Tang, Xuechao Zou, Zhengpei Hu, Junliang Xing, Chengkun Zhang, Jianqiang Huang

TL;DR
This paper introduces a hierarchical fusion model combining local and global visual features with a mixture-of-experts classifier for improved remote sensing scene classification, achieving state-of-the-art accuracy on multiple datasets.
Contribution
It proposes a novel parallel heterogeneous encoder and hierarchical fusion module to effectively integrate multi-scale local and global features for remote sensing image classification.
Findings
Achieves 93.72% accuracy on AID dataset
Surpasses state-of-the-art methods in accuracy and efficiency
Demonstrates effective local-global feature integration
Abstract
Remote sensing image scene classification remains a challenging task, primarily due to the complex spatial structures and multi-scale characteristics of ground objects. Although CNN-based methods excel at extracting local inductive biases, and Mamba-based approaches demonstrate impressive capabilities in efficiently capturing global sequential context, relying on a single paradigm restricts the model's ability to simultaneously characterize fine-grained textures and complex spatial structures. To tackle this, we propose a parallel heterogeneous encoder, a hierarchical fusion module designed to achieve effective local-global co-representation. It consists of two parallel pathways: a local visual encoder for extracting multi-scale local visual features, and a global visual encoder for capturing efficient global visual features. The core innovation lies in its hierarchical fusion module,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRemote-Sensing Image Classification · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning
