DiffuSAM: Diffusion Guided Zero-Shot Object Grounding for Remote Sensing Imagery

Geet Sethi; Panav Shah; Ashutosh Gandhe; Soumitra Darshan Nayak

arXiv:2604.18201·cs.CV·April 21, 2026

DiffuSAM: Diffusion Guided Zero-Shot Object Grounding for Remote Sensing Imagery

Geet Sethi, Panav Shah, Ashutosh Gandhe, Soumitra Darshan Nayak

PDF

TL;DR

This paper introduces DiffuSAM, a hybrid approach combining diffusion models and segmentation techniques to improve object grounding accuracy in remote sensing images, achieving significant performance gains.

Contribution

The work presents a novel pipeline integrating diffusion-based cues with segmentation models for enhanced remote sensing object localization.

Findings

01

Achieved over 14% increase in [email protected] compared to previous methods.

02

Demonstrated robustness and adaptability in complex scenes.

03

Leveraged complementary strengths of generative and segmentation models.

Abstract

Diffusion models have emerged as powerful tools for a wide range of vision tasks, including text-guided image generation and editing. In this work, we explore their potential for object grounding in remote sensing imagery. We propose a hybrid pipeline that integrates diffusion-based localization cues with state-of-the-art segmentation models such as RemoteSAM and SAM3 to obtain more accurate bounding boxes. By leveraging the complementary strengths of generative diffusion models and foundational segmentation models, our approach enables robust and adaptive object localization across complex scenes. Experiments demonstrate that our pipeline significantly improves localization performance, achieving over a 14% increase in [email protected] compared to existing state-of-the-art methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.