MGIMM: Multi-Granularity Instruction Multimodal Model for Attribute-Guided Remote Sensing Image Detailed Description
Cong Yang, Zuchao Li, Lefei Zhang

TL;DR
This paper introduces MGIMM, a novel multimodal model that enhances remote sensing image descriptions by learning region-attribute consistency and leveraging multi-grain visual features, addressing the unique challenges of remote sensing data.
Contribution
It proposes a region-attribute guided instruction tuning approach and constructs a new dataset for remote sensing image description, improving model performance in this domain.
Findings
MGIMM outperforms existing methods on the new dataset.
Region-attribute guided learning improves description accuracy.
Constructed dataset facilitates future research in remote sensing descriptions.
Abstract
Recently, large multimodal models have built a bridge from visual to textual information, but they tend to underperform in remote sensing scenarios. This underperformance is due to the complex distribution of objects and the significant scale differences among targets in remote sensing images, leading to visual ambiguities and insufficient descriptions by these multimodal models. Moreover, the lack of multimodal fine-tuning data specific to the remote sensing field makes it challenging for the model's behavior to align with user queries. To address these issues, this paper proposes an attribute-guided \textbf{Multi-Granularity Instruction Multimodal Model (MGIMM)} for remote sensing image detailed description. MGIMM guides the multimodal model to learn the consistency between visual regions and corresponding text attributes (such as object names, colors, and shapes) through region-level…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Computational Techniques and Applications · Geographic Information Systems Studies · Image Retrieval and Classification Techniques
MethodsALIGN
