Towards Real-Time Multi-Object Tracking
Zhongdao Wang, Liang Zheng, Yixuan Liu, Yali Li, Shengjin Wang

TL;DR
This paper introduces a real-time multi-object tracking system that integrates detection and appearance embedding into a single model, significantly improving speed while maintaining competitive accuracy.
Contribution
The authors propose a unified model for detection and embedding in MOT, enabling near real-time performance with reduced computational cost.
Findings
Achieves 22-40 FPS depending on input resolution.
Maintains competitive tracking accuracy with 64.4% MOTA on MOT-16.
First near real-time MOT system reported.
Abstract
Modern multiple object tracking (MOT) systems usually follow the \emph{tracking-by-detection} paradigm. It has 1) a detection model for target localization and 2) an appearance embedding model for data association. Having the two models separately executed might lead to efficiency problems, as the running time is simply a sum of the two steps without investigating potential structures that can be shared between them. Existing research efforts on real-time MOT usually focus on the association step, so they are essentially real-time association methods but not real-time MOT system. In this paper, we propose an MOT system that allows target detection and appearance embedding to be learned in a shared model. Specifically, we incorporate the appearance embedding model into a single-shot detector, such that the model can simultaneously output detections and the corresponding embeddings. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
