FARTrack: Fast Autoregressive Visual Tracking with High Performance
Guijie Wang, Tong Lin, Yifan Bai, Anjia Cao, Shiyi Liang, Wangbo Zhao, Xing Wei

TL;DR
FARTrack introduces a fast autoregressive visual tracking framework that maintains high accuracy while significantly improving inference speed, suitable for resource-limited devices, through novel distillation and sparsification techniques.
Contribution
The paper presents FARTrack, a novel autoregressive tracking method with task-specific self-distillation and inter-frame sparsification, achieving high speed and competitive accuracy.
Findings
Achieves 70.6% AO on GOT-10k in real-time.
Reaches 343 FPS on GPU and 121 FPS on CPU.
Maintains high tracking performance with improved efficiency.
Abstract
Inference speed and tracking performance are two critical evaluation metrics in the field of visual tracking. However, high-performance trackers often suffer from slow processing speeds, making them impractical for deployment on resource-constrained devices. To alleviate this issue, we propose FARTrack, a Fast Auto-Regressive Tracking framework. Since autoregression emphasizes the temporal nature of the trajectory sequence, it can maintain high performance while achieving efficient execution across various devices. FARTrack introduces Task-Specific Self-Distillation and Inter-frame Autoregressive Sparsification, designed from the perspectives of shallow-yet-accurate distillation and redundant-to-essential token optimization, respectively. Task-Specific Self-Distillation achieves model compression by distilling task-specific tokens layer by layer, enhancing the model's inference speed…
Peer Reviews
Decision·ICLR 2026 Poster
- The paper is well presented, with clear motivation, solid organization, and consistent writing quality. - FARTrack achieves impressive speed–accuracy trade-offs across GPU, CPU, and NPU platforms. - The design is simple and potentially applicable to other lightweight tracking pipelines.
- Compared with prior autoregressive tracking baselines or other light-weight trackers, the training set now includes VastTrack and LaSOT Ext, which significantly expand data diversity and size. Since VastTrack has been shown to enhance generalization and robustness, using it for training raises fairness concerns in comparing FARTrack with earlier methods that were trained on smaller datasets. - The visualization in Figure 5 suggests that feature representations from layers 7–14 appear highly si
- **Strong empirical results** FARTracktiny achieves 70.6% AO on GOT-10k at 135 FPS (GPU), outperforming closest competitors while maintaining comparable speed. Multi-platform evaluation (GPU/CPU/NPU) demonstrates practical applicability. - **Comprehensive ablation studies** Thorough analysis of distillation strategies, sparsification methods and design choices provide insights on the impact of distillation and other architectural choices. - **Efficient Distillation Strategy** Avoids manual lay
- **Incremental technical contribution** The core novelty is primarily an engineering combination of ARTrack with existing techniques (self-distillation and attention-based sparsification). While the application is competent and results are solid, the conceptual advance over applying standard acceleration techniques to ARTrack is limited. The paper would benefit from clearer articulation of what makes this combination non-trivial beyond implementation. - **Missing baseline experimental compariso
1.The paper is well written. 2.Improve the efficiency of tracker is a good direction in visual object tracking, which makes progress in this feild. 3.Use Inter-frame Autoregressive Sparsification sounds new.
1.Self-Distillation is not new. The concept has been proposed by diffrenet field. 2.The comparison methods are insufficient — most baselines are from earlier years, with no inclusion of 2024 works and only one paper from 2025. The paper should include more recent state-of-the-art methods for a fair and comprehensive comparison. 3.No MACs and Parameters analysis with sota methods.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Gaze Tracking and Assistive Technology · Human Pose and Action Recognition
