Joint Detection and Tracking in Videos with Identification Features
Bharti Munjal, Abdul Rafey Aftab, Sikandar Amin, Meltem D., Brandlmaier, Federico Tombari, Fabio Galasso

TL;DR
This paper introduces a novel joint detection, tracking, and re-identification method for videos that maintains high performance even at low frame-rates, suitable for embedded devices with limited computational resources.
Contribution
It presents the first joint optimization approach for detection, tracking, and re-identification that preserves detector performance and is effective in low-frame-rate scenarios.
Findings
Achieves state-of-the-art results on MOT benchmark.
Ranks 1st in UA-DETRAC'18 online tracking challenge.
Demonstrates robustness in low-frame-rate and occlusion scenarios.
Abstract
Recent works have shown that combining object detection and tracking tasks, in the case of video data, results in higher performance for both tasks, but they require a high frame-rate as a strict requirement for performance. This is assumption is often violated in real-world applications, when models run on embedded devices, often at only a few frames per second. Videos at low frame-rate suffer from large object displacements. Here re-identification features may support to match large-displaced object detections, but current joint detection and re-identification formulations degrade the detector performance, as these two are contrasting tasks. In the real-world application having separate detector and re-id models is often not feasible, as both the memory and runtime effectively double. Towards robust long-term tracking applicable to reduced-computational-power devices, we propose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
