EfficientSAM3: Progressive Hierarchical Distillation for Video Concept Segmentation from SAM1, 2, and 3
Chengxi Zeng, Yuxuan Jiang, Aaron Zhang

TL;DR
EfficientSAM3 introduces a progressive hierarchical distillation approach to create lightweight, on-device capable models for video concept segmentation that closely mimic the performance of the larger SAM3 model.
Contribution
The paper proposes a novel three-stage distillation framework, PHD, to efficiently transfer capabilities from SAM3 to smaller models suitable for on-device use.
Findings
Achieves strong performance-efficiency trade-offs on VOS datasets.
Enables on-device concept segmentation and tracking.
Maintains high fidelity to the teacher model.
Abstract
The Segment Anything Model 3 (SAM3) advances visual understanding with Promptable Concept Segmentation (PCS) across images and videos, but its unified architecture (shared vision backbone, DETR-style detector, dense-memory tracker) remains prohibitive for on-device use. We present EfficientSAM3, a family of efficient models built on Progressive Hierarchical Distillation (PHD) that transfers capability from SAM3 to lightweight students in three stages: (1) Encoder Distillation aligns image features via prompt-in-the-loop training on SA-1B; (2) Temporal Memory Distillation replaces dense memory with a compact Perceiver-based module trained on SA-V to compress and retrieve spatiotemporal features efficiently; and (3) End-to-End Fine-Tuning refines the full pipeline on the official SAM3 PCS data to preserve concept-level performance. PHD yields a spectrum of student variants using RepViT,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Multimodal Machine Learning Applications · Advanced Neural Network Applications
