Towards Real-Time Multi-Object Tracking

Zhongdao Wang; Liang Zheng; Yixuan Liu; Yali Li; Shengjin Wang

arXiv:1909.12605·cs.CV·July 15, 2020·32 cites

Towards Real-Time Multi-Object Tracking

Zhongdao Wang, Liang Zheng, Yixuan Liu, Yali Li, Shengjin Wang

PDF

Open Access 5 Repos

TL;DR

This paper introduces a real-time multi-object tracking system that integrates detection and appearance embedding into a single model, significantly improving speed while maintaining competitive accuracy.

Contribution

The authors propose a unified model for detection and embedding in MOT, enabling near real-time performance with reduced computational cost.

Findings

01

Achieves 22-40 FPS depending on input resolution.

02

Maintains competitive tracking accuracy with 64.4% MOTA on MOT-16.

03

First near real-time MOT system reported.

Abstract

Modern multiple object tracking (MOT) systems usually follow the \emph{tracking-by-detection} paradigm. It has 1) a detection model for target localization and 2) an appearance embedding model for data association. Having the two models separately executed might lead to efficiency problems, as the running time is simply a sum of the two steps without investigating potential structures that can be shared between them. Existing research efforts on real-time MOT usually focus on the association step, so they are essentially real-time association methods but not real-time MOT system. In this paper, we propose an MOT system that allows target detection and appearance embedding to be learned in a shared model. Specifically, we incorporate the appearance embedding model into a single-shot detector, such that the model can simultaneously output detections and the corresponding embeddings. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Advanced Neural Network Applications · Advanced Image and Video Retrieval Techniques

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings