Aquila: A Hierarchically Aligned Visual-Language Model for Enhanced Remote Sensing Image Comprehension
Kaixuan Lu, Ruiqian Zhang, Xiao Huang, Yuxing Xie

TL;DR
Aquila is a novel hierarchical visual-language model tailored for remote sensing images, leveraging multi-scale high-resolution features and deep alignment to improve scene understanding beyond existing models.
Contribution
The paper introduces a Hierarchical Spatial Feature Integration module and deep visual-language alignment techniques for enhanced remote sensing image comprehension.
Findings
Aquila outperforms existing models in quantitative benchmarks.
The model effectively captures complex visual details in remote sensing scenes.
Qualitative analyses show improved interpretability of remote sensing images.
Abstract
Recently, large vision language models (VLMs) have made significant strides in visual language capabilities through visual instruction tuning, showing great promise in the field of remote sensing image interpretation. However, existing remote sensing vision language models (RSVLMs) often fall short in capturing the complex characteristics of remote sensing scenes, as they typically rely on low resolution, single scale visual features and simplistic methods to map visual features to language features. In this paper, we present Aquila, an advanced visual language foundation model designed to enable richer visual feature representation and more precise visual-language feature alignment for remote sensing images. Our approach introduces a learnable Hierarchical Spatial Feature Integration (SFI) module that supports high resolution image inputs and aggregates multi scale visual features,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques · Remote-Sensing Image Classification
