Segment and Track Anything
Yangming Cheng, Liulei Li, Yuanyou Xu, Xiaodi Li, Zongxin Yang,, Wenguan Wang, Yi Yang

TL;DR
SAMTrack is a versatile framework that combines segmentation and tracking in videos using multimodal interactions, enabling precise object tracking across various fields with high accuracy and user-friendly controls.
Contribution
The paper introduces SAMTrack, integrating SAM, AOT-based tracking, and Grounding-DINO for multimodal, interactive object segmentation and tracking in videos, achieving state-of-the-art performance.
Findings
Achieved 92.0% on DAVIS-2016 Val
Achieved 79.2% on DAVIS-2017 Test
Supports multiple interaction modes (click, stroke, text)
Abstract
This report presents a framework called Segment And Track Anything (SAMTrack) that allows users to precisely and effectively segment and track any object in a video. Additionally, SAM-Track employs multimodal interaction methods that enable users to select multiple objects in videos for tracking, corresponding to their specific requirements. These interaction methods comprise click, stroke, and text, each possessing unique benefits and capable of being employed in combination. As a result, SAM-Track can be used across an array of fields, ranging from drone technology, autonomous driving, medical imaging, augmented reality, to biological analysis. SAM-Track amalgamates Segment Anything Model (SAM), an interactive key-frame segmentation model, with our proposed AOT-based tracking model (DeAOT), which secured 1st place in four tracks of the VOT 2022 challenge, to facilitate object tracking…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI in Service Interactions · Virtual Reality Applications and Impacts
MethodsTest
