TinySAM 2: Extreme Memory Compression for Efficient Track Anything Model

Zhaoyuan Ding,Yijing Yang,Han Shu,Xinghao Chen

arXiv:2605.18013·cs.CV·May 19, 2026

TinySAM 2: Extreme Memory Compression for Efficient Track Anything Model

Zhaoyuan Ding,Yijing Yang,Han Shu,Xinghao Chen

PDF

TL;DR

TinySAM 2 is a lightweight, resource-efficient video segmentation model that maintains high performance by employing memory quality management and token compression techniques, enabling practical deployment.

Contribution

The paper introduces TinySAM 2, a novel lightweight model with a memory management mechanism and token compression, reducing computational costs while retaining 90% of SAM 2's performance.

Findings

01

Achieves 90% of SAM 2.1's performance on DAVIS and SA-V datasets.

02

Uses only 7% memory tokens and 3% training data compared to SAM 2.

03

Significantly reduces parameter count and computational load for practical deployment.

Abstract

Segment Anything Model 2 (SAM 2) serves as a core foundation model in the field of video segmentation. Building upon the original SAM model, it introduces a memory bank mechanism and demonstrates outstanding performance in tasks such as semi-supervised video object segmentation and tracking anything. However, the complex computational characteristics of SAM 2's multi-stage image encoder and memory module have raised the barrier to the model's deployment in practical applications. To address this issue, we propose TinySAM 2, a lightweight video segmentation model that balances performance and efficiency. First, a memory quality management mechanism is introduced to select and retain high-informative historical frames as the memory. In addition, a joint-spatial-temporal token compression is proposed that reduces the memory storage and computational cost. Specifically, average pooling is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.