SatSAM2: Motion-Constrained Video Object Tracking in Satellite Imagery using Promptable SAM2 and Kalman Priors
Ruijie Fan, Junyan Ye, Huan Chen, Zilong Huang, Xiaolei Wang, Weijia Li

TL;DR
SatSAM2 is a novel zero-shot satellite video tracker leveraging foundation models, Kalman filtering, and motion constraints to improve robustness and generalization in challenging satellite imagery scenarios.
Contribution
The paper introduces SatSAM2, combining SAM2 with motion priors and a new synthetic benchmark for satellite video tracking, advancing zero-shot capabilities.
Findings
SatSAM2 outperforms traditional and foundation model trackers on benchmarks.
Achieves 5.84% AUC improvement on the OOTB dataset.
Introduces MVOT, a large synthetic satellite video benchmark.
Abstract
Existing satellite video tracking methods often struggle with generalization, requiring scenario-specific training to achieve satisfactory performance, and are prone to track loss in the presence of occlusion. To address these challenges, we propose SatSAM2, a zero-shot satellite video tracker built on SAM2, designed to adapt foundation models to the remote sensing domain. SatSAM2 introduces two core modules: a Kalman Filter-based Constrained Motion Module (KFCMM) to exploit temporal motion cues and suppress drift, and a Motion-Constrained State Machine (MCSM) to regulate tracking states based on motion dynamics and reliability. To support large-scale evaluation, we propose MatrixCity Video Object Tracking (MVOT), a synthetic benchmark containing 1,500+ sequences and 157K annotated frames with diverse viewpoints, illumination, and occlusion conditions. Extensive experiments on two…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
