DiffusionTrack: Diffusion Model For Multi-Object Tracking
Run Luo, Zikai Song, Lintao Ma, Jinlin Wei, Wei Yang, Min Yang

TL;DR
DiffusionTrack introduces a diffusion model-based framework for multi-object tracking that jointly performs detection and association through a progressive denoising process, improving robustness and flexibility.
Contribution
It proposes a novel diffusion-based approach for joint detection and tracking, addressing common issues in existing methods with a simple, effective framework.
Findings
Achieves competitive results on MOT benchmarks
Effectively discriminates between multiple objects
Flexible one-step or multi-step inference process
Abstract
Multi-object tracking (MOT) is a challenging vision task that aims to detect individual objects within a single frame and associate them across multiple frames. Recent MOT approaches can be categorized into two-stage tracking-by-detection (TBD) methods and one-stage joint detection and tracking (JDT) methods. Despite the success of these approaches, they also suffer from common problems, such as harmful global or local inconsistency, poor trade-off between robustness and model complexity, and lack of flexibility in different scenes within the same video. In this paper we propose a simple but robust framework that formulates object detection and association jointly as a consistent denoising diffusion process from paired noise boxes to paired ground-truth boxes. This novel progressive denoising diffusion strategy substantially augments the tracker's effectiveness, enabling it to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Fire Detection and Safety Systems · Infrared Target Detection Methodologies
MethodsDiffusion
