Segment Anything Meets Point Tracking
Frano Raji\v{c}, Lei Ke, Yu-Wing Tai, Chi-Keung Tang, Martin, Danelljan, Fisher Yu

TL;DR
SAM-PT introduces a novel point-centric approach for interactive video segmentation, leveraging long-term point tracking with SAM to improve zero-shot performance and interaction efficiency across multiple benchmarks.
Contribution
The paper proposes SAM-PT, a new method that uses point propagation for video segmentation, exploiting local structure information independently of object semantics.
Findings
Outperforms traditional mask propagation methods on multiple benchmarks.
Achieves better zero-shot performance in open-world video object segmentation.
Provides an efficient point-based tracking framework with publicly available code.
Abstract
The Segment Anything Model (SAM) has established itself as a powerful zero-shot image segmentation model, enabled by efficient point-centric annotation and prompt-based models. While click and brush interactions are both well explored in interactive image segmentation, the existing methods on videos focus on mask annotation and propagation. This paper presents SAM-PT, a novel method for point-centric interactive video segmentation, empowered by SAM and long-term point tracking. SAM-PT leverages robust and sparse point selection and propagation techniques for mask generation. Compared to traditional object-centric mask propagation strategies, we uniquely use point propagation to exploit local structure information agnostic to object semantics. We highlight the merits of point-based tracking through direct evaluation on the zero-shot open-world Unidentified Video Objects (UVO) benchmark.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Image and Video Quality Assessment · Advanced Image and Video Retrieval Techniques
MethodsSegment Anything Model · Focus
