Depth-Aware Scoring and Hierarchical Alignment for Multiple Object Tracking
Milad Khanchi, Maria Amer, and Charalambos Poullis

TL;DR
This paper introduces a depth-aware multiple object tracking framework that leverages monocular depth estimation and hierarchical alignment to improve association accuracy, achieving state-of-the-art results without training.
Contribution
It is the first MOT framework to incorporate 3D monocular depth as an independent feature in the association process, enhancing robustness in occlusion scenarios.
Findings
Achieves state-of-the-art results on challenging benchmarks.
Operates without any training or fine-tuning.
Incorporates depth as an independent feature for object association.
Abstract
Current motion-based multiple object tracking (MOT) approaches rely heavily on Intersection-over-Union (IoU) for object association. Without using 3D features, they are ineffective in scenarios with occlusions or visually similar objects. To address this, our paper presents a novel depth-aware framework for MOT. We estimate depth using a zero-shot approach and incorporate it as an independent feature in the association process. Additionally, we introduce a Hierarchical Alignment Score that refines IoU by integrating both coarse bounding box overlap and fine-grained (pixel-level) alignment to improve association accuracy without requiring additional learnable parameters. To our knowledge, this is the first MOT framework to incorporate 3D features (monocular depth) as an independent decision matrix in the association step. Our framework achieves state-of-the-art results on challenging…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods
