DM$^3$T: Harmonizing Modalities via Diffusion for Multi-Object Tracking
Weiran Li, Yeqiang Liu, Yijie Wei, Mina Han, Qiannan Guo, Zhenbo Li

TL;DR
DM$^3$T introduces a diffusion-inspired multimodal fusion framework for multi-object tracking, achieving superior accuracy by iteratively harmonizing features from visible and thermal modalities.
Contribution
The paper presents a novel diffusion-based approach for multimodal feature fusion in MOT, enabling deeper integration and improved tracking robustness over traditional methods.
Findings
Achieves 41.7 HOTA on VT-MOT benchmark, outperforming state-of-the-art.
Introduces a Cross-Modal Diffusion Fusion module for iterative feature alignment.
Employs a Diffusion Refiner to enhance feature representation.
Abstract
Multi-object tracking (MOT) is a fundamental task in computer vision with critical applications in autonomous driving and robotics. Multimodal MOT that integrates visible light and thermal infrared information is particularly essential for robust autonomous driving systems. However, effectively fusing these heterogeneous modalities is challenging. Simple strategies like concatenation or addition often fail to bridge the significant non-linear distribution gap between their feature representations, which can lead to modality conflicts and degrade tracking accuracy. Drawing inspiration from the connection between multimodal MOT and the iterative refinement in diffusion models, this paper proposes DMT, a novel framework that reformulates multimodal fusion as an iterative feature alignment process to generate accurate and temporally coherent object trajectories. Our approach performs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Gaze Tracking and Assistive Technology · Autonomous Vehicle Technology and Safety
