MoSAM: Motion-Guided Segment Anything Model with Spatial-Temporal Memory Selection
Qiushi Yang, Yuan Yao, Miaomiao Cui, Liefeng Bo

TL;DR
MoSAM enhances the Segment Anything Model 2 for video object segmentation by integrating motion cues and dynamic memory selection, significantly improving long-range object tracking and segmentation accuracy.
Contribution
This paper introduces MoSAM, a novel approach that incorporates motion-guided prompting and spatial-temporal memory selection to improve video object segmentation.
Findings
Achieves state-of-the-art results on multiple benchmarks.
Effectively handles object disappearance and occlusion.
Improves long-range object tracking capabilities.
Abstract
The recent Segment Anything Model 2 (SAM2) has demonstrated exceptional capabilities in interactive object segmentation for both images and videos. However, as a foundational model on interactive segmentation, SAM2 performs segmentation directly based on mask memory from the past six frames, leading to two significant challenges. Firstly, during inference in videos, objects may disappear since SAM2 relies solely on memory without accounting for object motion information, which limits its long-range object tracking capabilities. Secondly, its memory is constructed from fixed past frames, making it susceptible to challenges associated with object disappearance or occlusion, due to potentially inaccurate segmentation results in memory. To address these problems, we present MoSAM, incorporating two key strategies to integrate object motion cues into the model and establish more reliable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Visualization and Analytics · Human Mobility and Location-Based Analysis · Data Management and Algorithms
MethodsSparse Evolutionary Training · Focus
