HAD: Hierarchical Asymmetric Distillation to Bridge Spatio-Temporal Gaps in Event-Based Object Tracking
Yao Deng, Xian Zhong, Wenxuan Liu, Zhaofei Yu, Jingling Yuan, Tiejun Huang

TL;DR
HAD introduces a hierarchical asymmetric distillation framework to effectively bridge the spatio-temporal gaps between RGB and event cameras, significantly improving multi-modal object tracking performance.
Contribution
The paper presents a novel hierarchical alignment strategy for multi-modal knowledge distillation that explicitly addresses spatio-temporal asymmetries between RGB and event data.
Findings
HAD outperforms existing state-of-the-art methods in object tracking tasks.
The hierarchical alignment reduces information loss and enhances multi-modal integration.
Ablation studies confirm the effectiveness of each component in HAD.
Abstract
RGB cameras excel at capturing rich texture details with high spatial resolution, whereas event cameras offer exceptional temporal resolution and a high dynamic range (HDR). Leveraging their complementary strengths can substantially enhance object tracking under challenging conditions, such as high-speed motion, HDR environments, and dynamic background interference. However, a significant spatio-temporal asymmetry exists between these two modalities due to their fundamentally different imaging mechanisms, hindering effective multi-modal integration. To address this issue, we propose {Hierarchical Asymmetric Distillation} (HAD), a multi-modal knowledge distillation framework that explicitly models and mitigates spatio-temporal asymmetries. Specifically, HAD proposes a hierarchical alignment strategy that minimizes information loss while maintaining the student network's computational…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Memory and Neural Computing · Advanced Neural Network Applications · Human Pose and Action Recognition
