On the Effectiveness of Textual Prompting with Lightweight Fine-Tuning for SAM3 Remote Sensing Segmentation

Roni Blushtein-Livnon; Osher Rafaeli; David Ioffe; Amir Boger; Karen Sandberg Esquenazi; Tal Svoray

arXiv:2512.15564·cs.CV·April 14, 2026

On the Effectiveness of Textual Prompting with Lightweight Fine-Tuning for SAM3 Remote Sensing Segmentation

Roni Blushtein-Livnon, Osher Rafaeli, David Ioffe, Amir Boger, Karen Sandberg Esquenazi, Tal Svoray

PDF

TL;DR

This paper evaluates the SAM3 framework for remote sensing image segmentation, highlighting the effectiveness of combining semantic and geometric prompts with lightweight fine-tuning under limited supervision.

Contribution

It demonstrates that hybrid prompting strategies and minimal geometric annotations significantly improve segmentation performance in remote sensing imagery.

Findings

01

Hybrid prompts outperform text-only prompts across targets.

02

Light fine-tuning with modest supervision yields substantial gains.

03

Performance gains diminish with increasing supervision, indicating efficiency of minimal annotation.

Abstract

Remote sensing (RS) image segmentation is constrained by the limited availability of annotated data and a gap between overhead imagery and natural images used to train foundational models. This motivates effective adaptation under limited supervision. SAM3 concept-driven framework generates masks from textual prompts without requiring task-specific modifications, which may enable this adaptation. We evaluate SAM3 for RS imagery across four target types, comparing textual, geometric, and hybrid prompting strategies, under lightweight fine-tuning scales with increasing supervision, alongside zero-shot inference. Results show that combining semantic and geometric cues yields the highest performance across targets and metrics. Text-only prompting exhibits the lowest performance, with marked score gaps for irregularly shaped targets, reflecting limited semantic alignment between SAM3 textual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.