SAM-PM: Enhancing Video Camouflaged Object Detection using   Spatio-Temporal Attention

Muhammad Nawfal Meeran; Gokul Adethya T; Bhanu Pratyush Mantha

arXiv:2406.05802·cs.CV·July 30, 2024·2 cites

SAM-PM: Enhancing Video Camouflaged Object Detection using Spatio-Temporal Attention

Muhammad Nawfal Meeran, Gokul Adethya T, Bhanu Pratyush Mantha

PDF

Open Access 1 Repo

TL;DR

This paper introduces SAM-PM, a spatio-temporal attention module that enhances video camouflaged object detection by enforcing temporal consistency, significantly improving performance while adding minimal additional parameters.

Contribution

The paper proposes a novel SAM Propagation Module that integrates with SAM to improve video camouflaged object detection through spatio-temporal cross-attention mechanisms, training only the module itself.

Findings

01

Substantial performance gains on VCOD benchmarks.

02

Effective incorporation of temporal consistency with minimal parameter increase.

03

Open-source code and pre-trained models available.

Abstract

In the domain of large foundation models, the Segment Anything Model (SAM) has gained notable recognition for its exceptional performance in image segmentation. However, tackling the video camouflage object detection (VCOD) task presents a unique challenge. Camouflaged objects typically blend into the background, making them difficult to distinguish in still images. Additionally, ensuring temporal consistency in this context is a challenging problem. As a result, SAM encounters limitations and falls short when applied to the VCOD task. To overcome these challenges, we propose a new method called the SAM Propagation Module (SAM-PM). Our propagation module enforces temporal consistency within SAM by employing spatio-temporal cross-attention mechanisms. Moreover, we exclusively train the propagation module while keeping the SAM network weights frozen, allowing us to integrate task-specific…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

spidernitt/sam-pm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Enhancement Techniques · Visual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques

MethodsSegment Anything Model