TSMS-SAM2: Multi-scale Temporal Sampling Augmentation and Memory-Splitting Pruning for Promptable Video Object Segmentation and Tracking in Surgical Scenarios
Guoping Xu, Hua-Chieh Shao, You Zhang

TL;DR
TSMS-SAM2 is a novel framework that improves promptable video object segmentation and tracking in surgical videos by using multi-scale temporal sampling and memory-splitting pruning, achieving state-of-the-art results.
Contribution
It introduces multi-temporal-scale sampling augmentation and memory-splitting pruning to enhance robustness and efficiency of SAM2 in complex surgical scenarios.
Findings
Achieved highest mean Dice scores of 95.24 on EndoVis2017 and 86.73 on EndoVis2018.
Outperformed prior SAM-based and task-specific methods.
Validated effectiveness through extensive ablation studies.
Abstract
Promptable video object segmentation and tracking (VOST) has seen significant advances with the emergence of foundation models like Segment Anything Model 2 (SAM2); however, their application in surgical video analysis remains challenging due to complex motion dynamics and the redundancy of memory that impedes effective learning. In this work, we propose TSMS-SAM2, a novel framework that enhances promptable VOST in surgical videos by addressing challenges of rapid object motion and memory redundancy in SAM2. TSMS-SAM2 introduces two key strategies: multi-temporal-scale video sampling augmentation to improve robustness against motion variability, and a memory splitting and pruning mechanism that organizes and filters past frame features for more efficient and accurate segmentation. Evaluated on EndoVis2017 and EndoVis2018 datasets, TSMS-SAM2 achieved the highest mean Dice scores of 95.24…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMedical Image Segmentation Techniques · Advanced Neural Network Applications · Visual Attention and Saliency Detection
