InstructSAM: A Training-Free Framework for Instruction-Oriented Remote Sensing Object Recognition

Yijie Zheng; Weijie Wu; Qingyun Li; Xuehui Wang; Xu Zhou; Aiai Ren; Jun Shen; Long Zhao; Guoqing Li; Xue Yang

arXiv:2505.15818·cs.CV·October 14, 2025

InstructSAM: A Training-Free Framework for Instruction-Oriented Remote Sensing Object Recognition

Yijie Zheng, Weijie Wu, Qingyun Li, Xuehui Wang, Xu Zhou, Aiai Ren, Jun Shen, Long Zhao, Guoqing Li, Xue Yang

PDF

Open Access 1 Repo 1 Video

TL;DR

InstructSAM is a training-free, instruction-driven framework for remote sensing object recognition that leverages large vision-language models to interpret instructions and efficiently identify objects without extensive training.

Contribution

The paper introduces InstructSAM, a novel training-free approach that interprets instructions for remote sensing object recognition, along with a new benchmark dataset and tasks for open-vocabulary scenarios.

Findings

01

InstructSAM matches or surpasses specialized baselines in multiple tasks.

02

It maintains near-constant inference time regardless of object count.

03

Reduces output tokens by 89% and runtime by over 32% compared to direct generation.

Abstract

Language-Guided object recognition in remote sensing imagery is crucial for large-scale mapping and automated data annotation. However, existing open-vocabulary and visual grounding methods rely on explicit category cues, limiting their ability to handle complex or implicit queries that require advanced reasoning. To address this issue, we introduce a new suite of tasks, including Instruction-Oriented Object Counting, Detection, and Segmentation (InstructCDS), covering open-vocabulary, open-ended, and open-subclass scenarios. We further present EarthInstruct, the first InstructCDS benchmark for earth observation. It is constructed from two diverse remote sensing datasets with varying spatial resolutions and annotation rules across 20 categories, necessitating models to interpret dataset-specific instructions. Given the scarcity of semantically rich labeled data in remote sensing, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

VoyagerXvoyagerx/InstructSAM
pytorchOfficial

Videos

InstructSAM: A Training-free Framework for Instruction-Oriented Remote Sensing Object Recognition· slideslive

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques