TinySAM 2: Extreme Memory Compression for Efficient Track Anything Model
Zhaoyuan Ding,Yijing Yang,Han Shu,Xinghao Chen

TL;DR
TinySAM 2 is a lightweight, resource-efficient video segmentation model that maintains high performance by employing memory quality management and token compression techniques, enabling practical deployment.
Contribution
The paper introduces TinySAM 2, a novel lightweight model with a memory management mechanism and token compression, reducing computational costs while retaining 90% of SAM 2's performance.
Findings
Achieves 90% of SAM 2.1's performance on DAVIS and SA-V datasets.
Uses only 7% memory tokens and 3% training data compared to SAM 2.
Significantly reduces parameter count and computational load for practical deployment.
Abstract
Segment Anything Model 2 (SAM 2) serves as a core foundation model in the field of video segmentation. Building upon the original SAM model, it introduces a memory bank mechanism and demonstrates outstanding performance in tasks such as semi-supervised video object segmentation and tracking anything. However, the complex computational characteristics of SAM 2's multi-stage image encoder and memory module have raised the barrier to the model's deployment in practical applications. To address this issue, we propose TinySAM 2, a lightweight video segmentation model that balances performance and efficiency. First, a memory quality management mechanism is introduced to select and retain high-informative historical frames as the memory. In addition, a joint-spatial-temporal token compression is proposed that reduces the memory storage and computational cost. Specifically, average pooling is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
