Det-SAM2:Technical Report on the Self-Prompting Segmentation Framework Based on Segment Anything Model 2
Zhiting Wang, Qiangong Zhou, Zongyang Liu

TL;DR
Det-SAM2 is an automated video segmentation framework that builds on SAM2, using detection models for object prompts, enabling efficient, accurate segmentation of long videos with minimal resource usage.
Contribution
It introduces a fully automated pipeline, Det-SAM2, combining detection and SAM2 for scalable, resource-efficient video segmentation, demonstrated through an AI refereeing application.
Findings
Maintains original SAM2 accuracy and efficiency on long videos
Enables inference on infinite video streams with constant resource usage
Demonstrates practical application in billiards AI refereeing
Abstract
Segment Anything Model 2 (SAM2) demonstrates exceptional performance in video segmentation and refinement of segmentation results. We anticipate that it can further evolve to achieve higher levels of automation for practical applications. Building upon SAM2, we conducted a series of practices that ultimately led to the development of a fully automated pipeline, termed Det-SAM2, in which object prompts are automatically generated by a detection model to facilitate inference and refinement by SAM2. This pipeline enables inference on infinitely long video streams with constant VRAM and RAM usage, all while preserving the same efficiency and accuracy as the original SAM2. This technical report focuses on the construction of the overall Det-SAM2 framework and the subsequent engineering optimization applied to SAM2. We present a case demonstrating an application built on the Det-SAM2…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPersona Design and Applications · Context-Aware Activity Recognition Systems · Service-Oriented Architecture and Web Services
