TL;DR
This paper explores the application of SAM 3 to remote sensing open-vocabulary segmentation tasks without training, introducing a mask fusion strategy and presence score filtering to improve performance across various datasets.
Contribution
It presents a training-free method leveraging SAM 3 for remote sensing tasks, combining outputs for better accuracy and reducing false positives in large vocabulary scenarios.
Findings
Achieves promising results on 20 segmentation datasets
Extends to open-vocabulary change detection with joint verification
Demonstrates potential of SAM 3 for diverse remote sensing tasks
Abstract
Most existing methods for training-free open-vocabulary semantic segmentation are based on CLIP. While these approaches have made progress, they often face challenges in precise localization or require complex pipelines to combine separate modules, especially in remote sensing scenarios where numerous dense and small targets are present. Recently, Segment Anything Model 3 (SAM 3) was proposed, unifying segmentation and recognition in a promptable framework. In this paper, we present a comprehensive exploration of applying SAM 3 to the remote sensing open-vocabulary tasks (i.e., 2D semantic segmentation, change detection, and 3D semantic segmentation) without any training. First, we implement a mask fusion strategy that combines the outputs from SAM 3's semantic segmentation head and the Transformer decoder (instance head). This allows us to leverage the strengths of both heads for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
