SAM-PM: Enhancing Video Camouflaged Object Detection using Spatio-Temporal Attention
Muhammad Nawfal Meeran, Gokul Adethya T, Bhanu Pratyush Mantha

TL;DR
This paper introduces SAM-PM, a spatio-temporal attention module that enhances video camouflaged object detection by enforcing temporal consistency, significantly improving performance while adding minimal additional parameters.
Contribution
The paper proposes a novel SAM Propagation Module that integrates with SAM to improve video camouflaged object detection through spatio-temporal cross-attention mechanisms, training only the module itself.
Findings
Substantial performance gains on VCOD benchmarks.
Effective incorporation of temporal consistency with minimal parameter increase.
Open-source code and pre-trained models available.
Abstract
In the domain of large foundation models, the Segment Anything Model (SAM) has gained notable recognition for its exceptional performance in image segmentation. However, tackling the video camouflage object detection (VCOD) task presents a unique challenge. Camouflaged objects typically blend into the background, making them difficult to distinguish in still images. Additionally, ensuring temporal consistency in this context is a challenging problem. As a result, SAM encounters limitations and falls short when applied to the VCOD task. To overcome these challenges, we propose a new method called the SAM Propagation Module (SAM-PM). Our propagation module enforces temporal consistency within SAM by employing spatio-temporal cross-attention mechanisms. Moreover, we exclusively train the propagation module while keeping the SAM network weights frozen, allowing us to integrate task-specific…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Enhancement Techniques · Visual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques
MethodsSegment Anything Model
