Segment Anything with Motion, Geometry, and Semantic Adaptation for Complex Nonlinear Visual Object Tracking

Deyi Zhu; Yuji Wang; Yong Liu; Yansong Tang; Bingyao Yu; Jiwen Lu; Jie Zhou

arXiv:2605.22538·cs.CV·May 22, 2026

Segment Anything with Motion, Geometry, and Semantic Adaptation for Complex Nonlinear Visual Object Tracking

Deyi Zhu, Yuji Wang, Yong Liu, Yansong Tang, Bingyao Yu, Jiwen Lu, Jie Zhou

PDF

1 Repo

TL;DR

SAMOSA enhances foundation model-based visual object tracking by explicitly modeling motion, geometry, and semantics, leading to improved robustness and generalization in complex nonlinear scenarios.

Contribution

It introduces a novel framework that adapts SAM 2 for tracking by incorporating motion prediction, semantic cues, and geometric constraints.

Findings

01

Outperforms state-of-the-art SAM 2-based methods on benchmarks.

02

Demonstrates stronger generalization than supervised methods.

03

Achieves significant gains on anti-UAV datasets.

Abstract

Traditional visual object tracking (VOT) methods typically rely on task-specific supervised training, limiting their generalization to unseen objects and challenging scenarios with distractors, occlusion, and nonlinear motion. Recent vision foundation models, exemplified by SAM 2, learn strong video understanding priors from large-scale pretraining and offer a promising foundation for building more robust and generalizable trackers. However, directly applying SAM 2 to VOT remains suboptimal, as it does not explicitly model target motion dynamics or enforce geometric and semantic consistency across frames, both of which are essential for reliable tracking. To address this issue, we propose SAMOSA, a new tracking framework that adapts SAM 2 to complex VOT scenarios by explicitly leveraging motion, geometry, and semantic cues. Specifically, we introduce a lightweight nonlinear motion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

DurYi/SAMOSA
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.