ByteTrack: Multi-Object Tracking by Associating Every Detection Box
Yifu Zhang, Peize Sun, Yi Jiang, Dongdong Yu, Fucheng Weng, Zehuan, Yuan, Ping Luo, Wenyu Liu, Xinggang Wang

TL;DR
ByteTrack introduces a simple yet effective multi-object tracking method that associates almost every detection, including low-score ones, leading to significant improvements in tracking accuracy across multiple benchmarks.
Contribution
The paper proposes a novel association method that recovers true objects from low-score detections and integrates it into a strong tracker, achieving state-of-the-art results.
Findings
Achieves 80.3 MOTA on MOT17 test set.
Improves IDF1 scores by 1-10 points across trackers.
State-of-the-art performance on multiple benchmarks.
Abstract
Multi-object tracking (MOT) aims at estimating bounding boxes and identities of objects in videos. Most methods obtain identities by associating detection boxes whose scores are higher than a threshold. The objects with low detection scores, e.g. occluded objects, are simply thrown away, which brings non-negligible true object missing and fragmented trajectories. To solve this problem, we present a simple, effective and generic association method, tracking by associating almost every detection box instead of only the high score ones. For the low score detection boxes, we utilize their similarities with tracklets to recover true objects and filter out the background detections. When applied to 9 different state-of-the-art trackers, our method achieves consistent improvement on IDF1 score ranging from 1 to 10 points. To put forwards the state-of-the-art performance of MOT, we design a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · UAV Applications and Optimization · Advanced Image and Video Retrieval Techniques
MethodsTest · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · (2+1)D Convolution
