Adapting Segment Anything Model to Multi-modal Salient Object Detection   with Semantic Feature Fusion Guidance

Kunpeng Wang; Danying Lin; Chenglong Li; Zhengzheng Tu; Bin Luo

arXiv:2408.15063·cs.CV·November 13, 2024

Adapting Segment Anything Model to Multi-modal Salient Object Detection with Semantic Feature Fusion Guidance

Kunpeng Wang, Danying Lin, Chenglong Li, Zhengzheng Tu, Bin Luo

PDF

Open Access 1 Repo

TL;DR

This paper introduces Sammese, a novel framework that adapts the pre-trained Segment Anything Model for multi-modal salient object detection by integrating multi-modal semantic features and prompts, improving detection accuracy in challenging scenes.

Contribution

The paper proposes a multi-modal feature fusion and prompt generation strategy to adapt SAM for multi-modal SOD, enabling zero-shot generalization and improved performance.

Findings

01

Effective multi-modal feature integration improves saliency detection.

02

Enhanced SAM performance on RGB-D and RGB-T benchmarks.

03

Framework demonstrates strong zero-shot capabilities.

Abstract

Although most existing multi-modal salient object detection (SOD) methods demonstrate effectiveness through training models from scratch, the limited multi-modal data hinders these methods from reaching optimality. In this paper, we propose a novel framework to explore and exploit the powerful feature representation and zero-shot generalization ability of the pre-trained Segment Anything Model (SAM) for multi-modal SOD. Despite serving as a recent vision fundamental model, driving the class-agnostic SAM to comprehend and detect salient objects accurately is non-trivial, especially in challenging scenes. To this end, we develop \underline{SAM} with se\underline{m}antic f\underline{e}ature fu\underline{s}ion guidanc\underline{e} (Sammese), which incorporates multi-modal saliency-specific knowledge into SAM to adapt SAM to multi-modal SOD tasks. However, it is difficult for SAM trained on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

angknpng/sammese
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection

MethodsAdapter · Segment Anything Model