SAM 2++: Tracking Anything at Any Granularity

Jiaming Zhang; Cheng Liang; Yichun Yang; Chenkai Zeng; Yutao Cui; Xinwen Zhang; Xin Zhou; Kai Ma; Gangshan Wu; Limin Wang

arXiv:2510.18822·cs.CV·May 19, 2026

SAM 2++: Tracking Anything at Any Granularity

Jiaming Zhang, Cheng Liang, Yichun Yang, Chenkai Zeng, Yutao Cui, Xinwen Zhang, Xin Zhou, Kai Ma, Gangshan Wu, Limin Wang

PDF

1 Repo 1 Models 1 Datasets

TL;DR

SAM 2++ introduces a unified framework capable of tracking targets at various granularities such as masks, boxes, and points, leveraging task-specific prompts and a task-adaptive memory mechanism.

Contribution

It presents the first unified video tracking model that handles multiple granularities with a common architecture and introduces a new diverse tracking dataset.

Findings

01

Sets a new state-of-the-art across multiple tracking tasks.

02

Effectively unifies memory across different granularities.

03

Demonstrates robustness and versatility in diverse tracking scenarios.

Abstract

Due to the varying granularity of target states across different tasks, most existing trackers are tailored to a single task, which specificity limits their generalization, preventing them from effectively utilizing multi-task training data and leading to redundancy in both model design and parameters. Although recent unified vision models share partial architectures across tasks, they usually retain task-specific interfaces and overlook the common tracking principle behind different granularities, leaving a gap for truly unified video tracking. To unify video tracking tasks, we present SAM 2++, a unified framework that can handle target states at different granularities, including masks, boxes, and points, through an integrated design of prompt encoding, output decoding, and memory representation. First, to handle different target granularities, we design task-specific prompts that map…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mcg-nju/SAM2-Plus
github

Models

🤗
MCG-NJU/SAM2-Plus
model· 57 dl· ♡ 3
57 dl♡ 3

Datasets

MCG-NJU/Tracking-Any-Granularity
dataset· 626 dl
626 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Human Pose and Action Recognition · Gaze Tracking and Assistive Technology