Event Stream-based Visual Object Tracking: HDETrack V2 and A High-Definition Benchmark
Shiao Wang, Xiao Wang, Chao Wang, Liye Jin, Lin Zhu, Bo Jiang,, Yonghong Tian, Jin Tang

TL;DR
This paper introduces HDETrack V2, a high-definition event-based tracking method utilizing hierarchical knowledge distillation, temporal Fourier transforms, and test-time tuning, validated on a new high-resolution dataset, EventVOT.
Contribution
The paper presents a novel high-resolution event-based tracking dataset and an advanced Transformer-based tracking method with hierarchical knowledge distillation and test-time tuning.
Findings
Effective tracking on high-resolution event data
Superior performance on multiple datasets
Validated robustness and accuracy
Abstract
We then introduce a novel hierarchical knowledge distillation strategy that incorporates the similarity matrix, feature representation, and response map-based distillation to guide the learning of the student Transformer network. We also enhance the model's ability to capture temporal dependencies by applying the temporal Fourier transform to establish temporal relationships between video frames. We adapt the network model to specific target objects during testing via a newly proposed test-time tuning strategy to achieve high performance and flexibility in target tracking. Recognizing the limitations of existing event-based tracking datasets, which are predominantly low-resolution, we propose EventVOT, the first large-scale high-resolution event-based tracking dataset. It comprises 1141 videos spanning diverse categories such as pedestrians, vehicles, UAVs, ping pong, etc. Extensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Memory and Neural Computing · Brain Tumor Detection and Classification · CCD and CMOS Imaging Sensors
MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Knowledge Distillation · Position-Wise Feed-Forward Layer · Adam · Softmax · Dropout · Absolute Position Encodings · Label Smoothing
