FreqTrack: Frequency Learning based Vision Transformer for RGB-Event Object Tracking
Jinlin You, Muyu Li, Xudong Zhao

TL;DR
FreqTrack introduces a frequency-aware transformer framework for RGB-event object tracking, leveraging frequency-domain transformations and wavelet-based edge refinement to improve robustness in complex scenes.
Contribution
The paper proposes a novel frequency-domain fusion method with a spectral enhancement transformer and wavelet edge refinement for improved RGB-event tracking.
Findings
Achieves 76.6% precision on COESOT benchmark.
Outperforms existing RGB-event tracking methods.
Effectively models high-frequency event data in challenging scenarios.
Abstract
Existing single-modal RGB trackers often face performance bottlenecks in complex dynamic scenes, while the introduction of event sensors offers new potential for enhancing tracking capabilities. However, most current RGB-event fusion methods, primarily designed in the spatial domain using convolutional, Transformer, or Mamba architectures, fail to fully exploit the unique temporal response and high-frequency characteristics of event data. To address this, we1 propose FreqTrack, a frequency-aware RGBE tracking framework that establishes complementary inter-modal correlations through frequency-domain transformations for more robust feature fusion. We design a Spectral Enhancement Transformer (SET) layer that incorporates multi-head dynamic Fourier filtering to adaptively enhance and select frequency-domain features. Additionally, we develop a Wavelet Edge Refinement (WER) module, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
