MoSAM: Motion-Guided Segment Anything Model with Spatial-Temporal Memory   Selection

Qiushi Yang; Yuan Yao; Miaomiao Cui; Liefeng Bo

arXiv:2505.00739·cs.CV·May 5, 2025

MoSAM: Motion-Guided Segment Anything Model with Spatial-Temporal Memory Selection

Qiushi Yang, Yuan Yao, Miaomiao Cui, Liefeng Bo

PDF

Open Access

TL;DR

MoSAM enhances the Segment Anything Model 2 for video object segmentation by integrating motion cues and dynamic memory selection, significantly improving long-range object tracking and segmentation accuracy.

Contribution

This paper introduces MoSAM, a novel approach that incorporates motion-guided prompting and spatial-temporal memory selection to improve video object segmentation.

Findings

01

Achieves state-of-the-art results on multiple benchmarks.

02

Effectively handles object disappearance and occlusion.

03

Improves long-range object tracking capabilities.

Abstract

The recent Segment Anything Model 2 (SAM2) has demonstrated exceptional capabilities in interactive object segmentation for both images and videos. However, as a foundational model on interactive segmentation, SAM2 performs segmentation directly based on mask memory from the past six frames, leading to two significant challenges. Firstly, during inference in videos, objects may disappear since SAM2 relies solely on memory without accounting for object motion information, which limits its long-range object tracking capabilities. Secondly, its memory is constructed from fixed past frames, making it susceptible to challenges associated with object disappearance or occlusion, due to potentially inaccurate segmentation results in memory. To address these problems, we present MoSAM, incorporating two key strategies to integrate object motion cues into the model and establish more reliable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Visualization and Analytics · Human Mobility and Location-Based Analysis · Data Management and Algorithms

MethodsSparse Evolutionary Training · Focus