Depth-Aware Scoring and Hierarchical Alignment for Multiple Object Tracking

Milad Khanchi; Maria Amer; and Charalambos Poullis

arXiv:2506.00774·cs.CV·June 3, 2025

Depth-Aware Scoring and Hierarchical Alignment for Multiple Object Tracking

Milad Khanchi, Maria Amer, and Charalambos Poullis

PDF

Open Access

TL;DR

This paper introduces a depth-aware multiple object tracking framework that leverages monocular depth estimation and hierarchical alignment to improve association accuracy, achieving state-of-the-art results without training.

Contribution

It is the first MOT framework to incorporate 3D monocular depth as an independent feature in the association process, enhancing robustness in occlusion scenarios.

Findings

01

Achieves state-of-the-art results on challenging benchmarks.

02

Operates without any training or fine-tuning.

03

Incorporates depth as an independent feature for object association.

Abstract

Current motion-based multiple object tracking (MOT) approaches rely heavily on Intersection-over-Union (IoU) for object association. Without using 3D features, they are ineffective in scenarios with occlusions or visually similar objects. To address this, our paper presents a novel depth-aware framework for MOT. We estimate depth using a zero-shot approach and incorporate it as an independent feature in the association process. Additionally, we introduce a Hierarchical Alignment Score that refines IoU by integrating both coarse bounding box overlap and fine-grained (pixel-level) alignment to improve association accuracy without requiring additional learnable parameters. To our knowledge, this is the first MOT framework to incorporate 3D features (monocular depth) as an independent decision matrix in the association step. Our framework achieves state-of-the-art results on challenging…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods