Reliable Object Tracking by Multimodal Hybrid Feature Extraction and   Transformer-Based Fusion

Hongze Sun; Rui Liu; Wuque Cai; Jun Wang; Yue Wang; Huajin Tang; Yan; Cui; Dezhong Yao; Daqing Guo

arXiv:2405.17903·cs.CV·October 24, 2024

Reliable Object Tracking by Multimodal Hybrid Feature Extraction and Transformer-Based Fusion

Hongze Sun, Rui Liu, Wuque Cai, Jun Wang, Yue Wang, Huajin Tang, Yan, Cui, Dezhong Yao, Daqing Guo

PDF

1 Repo

TL;DR

This paper introduces a multimodal hybrid tracker (MMHT) that combines frame and event-based data using a hybrid neural network and transformer fusion to improve the reliability of object tracking in challenging scenarios.

Contribution

The paper presents a novel MMHT model that employs a hybrid ANN and SNN backbone with transformer-based feature fusion for enhanced multimodal object tracking.

Findings

01

MMHT achieves competitive performance against state-of-the-art methods.

02

Effective multimodal feature extraction improves tracking in low light and cluttered backgrounds.

03

Transformer-based fusion enhances discriminative feature modeling.

Abstract

Visual object tracking, which is primarily based on visible light image sequences, encounters numerous challenges in complicated scenarios, such as low light conditions, high dynamic ranges, and background clutter. To address these challenges, incorporating the advantages of multiple visual modalities is a promising solution for achieving reliable object tracking. However, the existing approaches usually integrate multimodal inputs through adaptive local feature interactions, which cannot leverage the full potential of visual cues, thus resulting in insufficient feature modeling. In this study, we propose a novel multimodal hybrid tracker (MMHT) that utilizes frame-event-based data for reliable single object tracking. The MMHT model employs a hybrid backbone consisting of an artificial neural network (ANN) and a spiking neural network (SNN) to extract dominant features from different…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

GuoLab-UESTC/MMHT
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsALIGN