SAM3-DMS: Decoupled Memory Selection for Multi-target Video Segmentation of SAM3
Ruiqi Shen, Chang Liu, Henghui Ding

TL;DR
SAM3-DMS introduces a decoupled, fine-grained memory selection method for multi-target video segmentation, significantly improving identity preservation and tracking stability, especially in complex scenes with many objects.
Contribution
It proposes a training-free, decoupled memory selection strategy for SAM3, enhancing multi-object segmentation performance in complex scenarios.
Findings
Improves identity preservation in multi-target segmentation.
Achieves more stable tracking with increased target density.
Outperforms original SAM3 in complex multi-object scenes.
Abstract
Segment Anything 3 (SAM3) has established a powerful foundation that robustly detects, segments, and tracks specified targets in videos. However, in its original implementation, its group-level collective memory selection is suboptimal for complex multi-object scenarios, as it employs a synchronized decision across all concurrent targets conditioned on their average performance, often overlooking individual reliability. To this end, we propose SAM3-DMS, a training-free decoupled strategy that utilizes fine-grained memory selection on individual objects. Experiments demonstrate that our approach achieves robust identity preservation and tracking stability. Notably, our advantage becomes more pronounced with increased target density, establishing a solid foundation for simultaneous multi-target video segmentation in the wild.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Visual Attention and Saliency Detection · Advanced Neural Network Applications
