SAM-I2V: Upgrading SAM to Support Promptable Video Segmentation with Less than 0.2% Training Cost

Haiyang Mei; Pengyu Zhang; Mike Zheng Shou

arXiv:2506.01304·cs.CV·June 3, 2025

SAM-I2V: Upgrading SAM to Support Promptable Video Segmentation with Less than 0.2% Training Cost

Haiyang Mei, Pengyu Zhang, Mike Zheng Shou

PDF

Open Access 1 Repo

TL;DR

SAM-I2V effectively upgrades the existing SAM model to support promptable video segmentation with minimal additional training, achieving high performance at a fraction of the original training cost.

Contribution

It introduces a novel image-to-video upgrade method for SAM, reducing training complexity and resource needs while maintaining high segmentation performance.

Findings

01

Achieves over 90% of SAM 2's performance

02

Uses only 0.2% of SAM 2's training cost

03

Enables resource-efficient promptable video segmentation

Abstract

Foundation models like the Segment Anything Model (SAM) have significantly advanced promptable image segmentation in computer vision. However, extending these capabilities to videos presents substantial challenges, particularly in ensuring precise and temporally consistent mask propagation in dynamic scenes. SAM 2 attempts to address this by training a model on massive image and video data from scratch to learn complex spatiotemporal associations, resulting in huge training costs that hinder research and practical deployment. In this paper, we introduce SAM-I2V, an effective image-to-video upgradation method for cultivating a promptable video segmentation (PVS) model. Our approach strategically upgrades the pre-trained SAM to support PVS, significantly reducing training complexity and resource requirements. To achieve this, we introduce three key innovations: (i) an image-to-video…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

showlab/sam-i2v
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Image Processing Techniques and Applications

MethodsSegment Anything Model