When SAM2 Meets Video Shadow and Mirror Detection
Leiping Jie

TL;DR
This paper evaluates SAM2's performance on video shadow and mirror detection tasks, revealing its limitations in segmenting rare objects and highlighting areas for future improvement.
Contribution
It is the first to assess SAM2 on video shadow and mirror detection, exposing its current shortcomings in these specialized segmentation tasks.
Findings
SAM2 performs suboptimally on video shadow and mirror detection tasks.
Point prompts lead to lower performance compared to mask prompts.
The study provides insights into SAM2's limitations in rare object segmentation.
Abstract
As the successor to the Segment Anything Model (SAM), the Segment Anything Model 2 (SAM2) not only improves performance in image segmentation but also extends its capabilities to video segmentation. However, its effectiveness in segmenting rare objects that seldom appear in videos remains underexplored. In this study, we evaluate SAM2 on three distinct video segmentation tasks: Video Shadow Detection (VSD) and Video Mirror Detection (VMD). Specifically, we use ground truth point or mask prompts to initialize the first frame and then predict corresponding masks for subsequent frames. Experimental results show that SAM2's performance on these tasks is suboptimal, especially when point prompts are used, both quantitatively and qualitatively. Code is available at \url{https://github.com/LeipingJie/SAM2Video}
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Media Forensic Detection · Infrared Target Detection Methodologies · Face recognition and analysis
