SOTFormer: A Minimal Transformer for Unified Object Tracking and Trajectory Prediction
Zhongping Dong, Pengyang Yu, Shuangjian Li, Liming Chen, and Mohand Tahar Kechadi

TL;DR
SOTFormer is a minimal, real-time transformer model that unifies object tracking and trajectory prediction, achieving high accuracy and efficiency under challenging conditions like occlusion and scale variation.
Contribution
It introduces a lightweight, end-to-end transformer framework with a memory and stabilization loss, improving stability and speed over prior recurrent or stacked models.
Findings
Achieves 76.3 AUC on Mini-LaSOT benchmark.
Runs at 53.7 FPS with 4.3 GB VRAM.
Outperforms prior transformer-based models in accuracy and speed.
Abstract
Accurate single-object tracking and short-term motion forecasting remain challenging under occlusion, scale variation, and temporal drift, which disrupt the temporal coherence required for real-time perception. We introduce \textbf{SOTFormer}, a minimal constant-memory temporal transformer that unifies object detection, tracking, and short-horizon trajectory prediction within a single end-to-end framework. Unlike prior models with recurrent or stacked temporal encoders, SOTFormer achieves stable identity propagation through a ground-truth-primed memory and a burn-in anchor loss that explicitly stabilizes initialization. A single lightweight temporal-attention layer refines embeddings across frames, enabling real-time inference with fixed GPU memory. On the Mini-LaSOT (20%) benchmark, SOTFormer attains 76.3 AUC and 53.7 FPS (AMP, 4.3 GB VRAM), outperforming transformer baselines such as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Advanced Neural Network Applications · Human Pose and Action Recognition
