TL;DR
This paper introduces a multimodal hybrid tracker (MMHT) that combines frame and event-based data using a hybrid neural network and transformer fusion to improve the reliability of object tracking in challenging scenarios.
Contribution
The paper presents a novel MMHT model that employs a hybrid ANN and SNN backbone with transformer-based feature fusion for enhanced multimodal object tracking.
Findings
MMHT achieves competitive performance against state-of-the-art methods.
Effective multimodal feature extraction improves tracking in low light and cluttered backgrounds.
Transformer-based fusion enhances discriminative feature modeling.
Abstract
Visual object tracking, which is primarily based on visible light image sequences, encounters numerous challenges in complicated scenarios, such as low light conditions, high dynamic ranges, and background clutter. To address these challenges, incorporating the advantages of multiple visual modalities is a promising solution for achieving reliable object tracking. However, the existing approaches usually integrate multimodal inputs through adaptive local feature interactions, which cannot leverage the full potential of visual cues, thus resulting in insufficient feature modeling. In this study, we propose a novel multimodal hybrid tracker (MMHT) that utilizes frame-event-based data for reliable single object tracking. The MMHT model employs a hybrid backbone consisting of an artificial neural network (ANN) and a spiking neural network (SNN) to extract dominant features from different…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsALIGN
