GRASP: Guided Region-Aware Sparse Prompting for Adapting MLLMs to Remote Sensing
Qigan Sun, Chaoning Zhang, Jianwei Zhang, Xudong Wang, Jiehui Xie, Pengcheng Zheng, Haoyu Wang, Sungyoung Lee, Chi-lok Andy Tai, Yang Yang, and Heng Tao Shen

TL;DR
This paper introduces GRASP, a parameter-efficient fine-tuning method that enhances multimodal large language models for remote sensing tasks by focusing on relevant regions and filtering background noise.
Contribution
The paper proposes a novel Guided Region-Aware Sparse Prompting (GRASP) strategy that improves remote sensing visual question answering by dynamically focusing on target regions with minimal parameters.
Findings
GRASP achieves competitive performance on RSVQA benchmarks.
It maintains high parameter efficiency compared to existing methods.
The approach effectively filters background noise in remote sensing images.
Abstract
In recent years, Multimodal Large Language Models (MLLMs) have made significant progress in visual question answering tasks. However, directly applying existing fine-tuning methods to remote sensing (RS) images often leads to issues such as overfitting on background noise or neglecting target details. This is primarily due to the large-scale variations, sparse target distributions, and complex regional semantic features inherent in RS images. These challenges limit the effectiveness of MLLMs in RS tasks. To address these challenges, we propose a parameter-efficient fine-tuning (PEFT) strategy called Guided Region-Aware Sparse Prompting (GRASP). GRASP introduces spatially structured soft prompts associated with spatial blocks extracted from a frozen visual token grid. Through a question-guided sparse fusion mechanism, GRASP dynamically aggregates task-specific context into a compact…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications
