# SCT-Diff: Seamless Contextual Tracking via Diffusion Trajectory

**Authors:** Guohao Nie, Xingmei Wang, Debin Zhang, He Wang

PMC · DOI: 10.3390/jimaging12010038 · Journal of Imaging · 2026-01-09

## TL;DR

SCT-Diff is a new video tracking framework that uses diffusion models to improve accuracy by incorporating future context and reducing error accumulation.

## Contribution

SCT-Diff introduces a diffusion-based tracking framework with closed-loop feedback from future frames to maintain temporal consistency.

## Key findings

- SCT-Diff achieves 75.4% AO on GOT-10k benchmark.
- The model maintains real-time computational efficiency while improving tracking accuracy.
- The framework uses a Mamba-based decoder to model trajectories as discrete token sequences.

## Abstract

Existing detection-based trackers exploit temporal contexts by updating appearance models or modeling target motion. However, the sequential one-shot integration of temporal priors risks amplifying error accumulation, as frame-level template matching restricts comprehensive spatiotemporal analysis. To address this, we propose SCT-Diff, a video-level framework that holistically estimates target trajectories. Specifically, SCT-Diff processes video clips globally via a diffusion model to incorporate bidirectional spatiotemporal awareness, where reverse diffusion steps progressively refine noisy trajectory proposals into optimal predictions. Crucially, SCT-Diff enables iterative correction of historical trajectory hypotheses by observing future contexts within a sliding time window. This closed-loop feedback from future frames preserves temporal consistency and breaks the error propagation chain under complex appearance variations. For joint modeling of appearance and motion dynamics, we formulate trajectories as unified discrete token sequences. The designed Mamba-based expert decoder bridges visual features with language-formulated trajectories, enabling lightweight yet coherent sequence modeling. Extensive experiments demonstrate SCT-Diff’s superior efficiency and performance, achieving 75.4% AO on GOT-10k while maintaining real-time computational efficiency.

## Full-text entities

- **Genes:** SCT (secretin) [NCBI Gene 6343]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12843046/full.md

## Figures

10 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12843046/full.md

## References

96 references — full list in the complete paper: https://tomesphere.com/paper/PMC12843046/full.md

---
Source: https://tomesphere.com/paper/PMC12843046