An Efficient Token Compression Framework for Visual Object Tracking

Weijing Wu; Qihua Liang; Bineng Zhong; Haiying Xia; Zhiyi Mo; Shuxiang Song

arXiv:2605.08329·cs.CV·May 12, 2026

An Efficient Token Compression Framework for Visual Object Tracking

Weijing Wu, Qihua Liang, Bineng Zhong, Haiying Xia, Zhiyi Mo, Shuxiang Song

PDF

1 Repo

TL;DR

ETCTrack introduces a dynamic token compression framework for visual tracking that reduces computational costs while maintaining high accuracy by filtering redundant features and enabling adaptive interaction.

Contribution

The paper proposes a novel compress-then-interact framework with an Adaptive Token Compressor and Hierarchical Interaction Encoder for efficient, high-performance visual tracking.

Findings

01

Reduces template tokens by 60%

02

Achieves 21.4% reduction in MACs

03

Only 0.4% accuracy drop on benchmarks

Abstract

Refining visual representations by eliminating their internal feature-level redundancy is crucial for simultaneously optimizing the performance and computational cost of models in visual tracking. To enhance their performance, many contemporary Transformer-based trackers leverage a larger number of historical template frames to capture richer spatio-temporal cues. However, this strategy leads to a massive number of input visual tokens. This creates two critical issues: it imposes a quadratic computational burden and can also degrade the tracker's overall performance. To bridge this gap, we propose a compress-then-interact tracking framework, ETCTrack, that learns to efficiently compress template tokens from historical template frames into a robust target representation, moving beyond handcrafted rules. Our method first employs the Adaptive Token Compressor to dynamically construct…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

PJD-WJ/ETCTrack
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.