DepthMOT: Depth Cues Lead to a Strong Multi-Object Tracker
Jiapeng Wu, Yichen Liu

TL;DR
DepthMOT introduces a novel multi-object tracking approach that leverages depth estimation and camera pose compensation to improve tracking accuracy in crowded and dynamic scenes.
Contribution
The paper presents an end-to-end method for depth estimation and camera motion compensation, enhancing multi-object tracking performance in challenging scenarios.
Findings
Superior performance on VisDrone-MOT dataset
Effective depth estimation integrated into tracking
Improved handling of occlusions and camera motion
Abstract
Accurately distinguishing each object is a fundamental goal of Multi-object tracking (MOT) algorithms. However, achieving this goal still remains challenging, primarily due to: (i) For crowded scenes with occluded objects, the high overlap of object bounding boxes leads to confusion among closely located objects. Nevertheless, humans naturally perceive the depth of elements in a scene when observing 2D videos. Inspired by this, even though the bounding boxes of objects are close on the camera plane, we can differentiate them in the depth dimension, thereby establishing a 3D perception of the objects. (ii) For videos with rapidly irregular camera motion, abrupt changes in object positions can result in ID switches. However, if the camera pose are known, we can compensate for the errors in linear motion models. In this paper, we propose \textit{DepthMOT}, which achieves: (i) detecting and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Hand Gesture Recognition Systems · Brain Tumor Detection and Classification
