Depthwise-Dilated Convolutional Adapters for Medical Object Tracking and Segmentation Using the Segment Anything Model 2
Guoping Xu, Christopher Kabat, You Zhang

TL;DR
This paper introduces DD-SAM2, a novel adapter-based framework that efficiently adapts SAM2 for medical video segmentation and tracking, achieving high accuracy with limited training data and minimal parameter overhead.
Contribution
It proposes DD-SAM2, the first systematic adaptation of SAM2 for medical videos using Depthwise-Dilated Adapters, enhancing multi-scale features with minimal parameters.
Findings
Achieved Dice scores of 0.93 on TrackRad2025 and 0.97 on EchoNet-Dynamic.
Demonstrated effective fine-tuning with limited data and minimal parameters.
Outperformed existing static image adaptation methods on medical video tasks.
Abstract
Recent advances in medical image segmentation have been driven by deep learning; however, most existing methods remain limited by modality-specific designs and exhibit poor adaptability to dynamic medical imaging scenarios. The Segment Anything Model 2 (SAM2) and its related variants, which introduce a streaming memory mechanism for real-time video segmentation, present new opportunities for prompt-based, generalizable solutions. Nevertheless, adapting these models to medical video scenarios typically requires large-scale datasets for retraining or transfer learning, leading to high computational costs and the risk of catastrophic forgetting. To address these challenges, we propose DD-SAM2, an efficient adaptation framework for SAM2 that incorporates a Depthwise-Dilated Adapter (DD-Adapter) to enhance multi-scale feature extraction with minimal parameter overhead. This design enables…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Video Surveillance and Tracking Methods · Human Pose and Action Recognition
