Generative-based Fusion Mechanism for Multi-Modal Tracking
Zhangyong Tang, Tianyang Xu, Xuefeng Zhu, Xiao-Jun Wu, Josef Kittler

TL;DR
This paper introduces a novel generative-based fusion mechanism using CGANs and DMs for multi-modal tracking, significantly improving performance by transforming features into harder instances to enhance discriminative clues.
Contribution
The paper pioneers the application of generative models for information fusion in multi-modal tracking, demonstrating state-of-the-art results on multiple benchmarks.
Findings
Achieved state-of-the-art performance on LasHeR and RGBD1K datasets.
Effective transformation of features into harder instances improves tracking accuracy.
Extensive experiments validate the superiority of the proposed generative fusion approach.
Abstract
Generative models (GMs) have received increasing research interest for their remarkable capacity to achieve comprehensive understanding. However, their potential application in the domain of multi-modal tracking has remained relatively unexplored. In this context, we seek to uncover the potential of harnessing generative techniques to address the critical challenge, information fusion, in multi-modal tracking. In this paper, we delve into two prominent GM techniques, namely, Conditional Generative Adversarial Networks (CGANs) and Diffusion Models (DMs). Different from the standard fusion process where the features from each modality are directly fed into the fusion block, we condition these multi-modal features with random noise in the GM framework, effectively transforming the original training samples into harder instances. This design excels at extracting discriminative clues from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Face recognition and analysis · Music and Audio Processing
MethodsDiffusion
