FreqTrack: Frequency Learning based Vision Transformer for RGB-Event Object Tracking

Jinlin You; Muyu Li; Xudong Zhao

arXiv:2604.14526·cs.CV·April 17, 2026

FreqTrack: Frequency Learning based Vision Transformer for RGB-Event Object Tracking

Jinlin You, Muyu Li, Xudong Zhao

PDF

TL;DR

FreqTrack introduces a frequency-aware transformer framework for RGB-event object tracking, leveraging frequency-domain transformations and wavelet-based edge refinement to improve robustness in complex scenes.

Contribution

The paper proposes a novel frequency-domain fusion method with a spectral enhancement transformer and wavelet edge refinement for improved RGB-event tracking.

Findings

01

Achieves 76.6% precision on COESOT benchmark.

02

Outperforms existing RGB-event tracking methods.

03

Effectively models high-frequency event data in challenging scenarios.

Abstract

Existing single-modal RGB trackers often face performance bottlenecks in complex dynamic scenes, while the introduction of event sensors offers new potential for enhancing tracking capabilities. However, most current RGB-event fusion methods, primarily designed in the spatial domain using convolutional, Transformer, or Mamba architectures, fail to fully exploit the unique temporal response and high-frequency characteristics of event data. To address this, we1 propose FreqTrack, a frequency-aware RGBE tracking framework that establishes complementary inter-modal correlations through frequency-domain transformations for more robust feature fusion. We design a Spectral Enhancement Transformer (SET) layer that incorporates multi-head dynamic Fourier filtering to adaptively enhance and select frequency-domain features. Additionally, we develop a Wavelet Edge Refinement (WER) module, which…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.