Generative-based Fusion Mechanism for Multi-Modal Tracking

Zhangyong Tang; Tianyang Xu; Xuefeng Zhu; Xiao-Jun Wu; Josef Kittler

arXiv:2309.01728·cs.CV·December 1, 2023

Generative-based Fusion Mechanism for Multi-Modal Tracking

Zhangyong Tang, Tianyang Xu, Xuefeng Zhu, Xiao-Jun Wu, Josef Kittler

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel generative-based fusion mechanism using CGANs and DMs for multi-modal tracking, significantly improving performance by transforming features into harder instances to enhance discriminative clues.

Contribution

The paper pioneers the application of generative models for information fusion in multi-modal tracking, demonstrating state-of-the-art results on multiple benchmarks.

Findings

01

Achieved state-of-the-art performance on LasHeR and RGBD1K datasets.

02

Effective transformation of features into harder instances improves tracking accuracy.

03

Extensive experiments validate the superiority of the proposed generative fusion approach.

Abstract

Generative models (GMs) have received increasing research interest for their remarkable capacity to achieve comprehensive understanding. However, their potential application in the domain of multi-modal tracking has remained relatively unexplored. In this context, we seek to uncover the potential of harnessing generative techniques to address the critical challenge, information fusion, in multi-modal tracking. In this paper, we delve into two prominent GM techniques, namely, Conditional Generative Adversarial Networks (CGANs) and Diffusion Models (DMs). Different from the standard fusion process where the features from each modality are directly fed into the fusion block, we condition these multi-modal features with random noise in the GM framework, effectively transforming the original training samples into harder instances. This design excels at extracting discriminative clues from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhangyong-tang/gmmt
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Face recognition and analysis · Music and Audio Processing

MethodsDiffusion