Object-Centric Multiple Object Tracking
Zixu Zhao, Jiaze Wang, Max Horn, Yizhuo Ding, Tong He, Zechen Bai,, Dominik Zietlow, Carl-Johann Simon-Gabriel, Bing Shuai, Zhuowen Tu, Thomas, Brox, Bernt Schiele, Yanwei Fu, Francesco Locatello, Zheng Zhang, Tianjun, Xiao

TL;DR
This paper introduces a novel unsupervised video object-centric model for multiple object tracking that reduces annotation needs and improves tracking consistency by leveraging object memory and self-supervised learning.
Contribution
It proposes an innovative object-centric MOT model with an index-merge and object memory module, eliminating the need for ID labels and significantly narrowing the performance gap with supervised methods.
Findings
Achieves high localization accuracy with minimal detection labels (0%-6.25%)
Outperforms several existing unsupervised trackers
Reduces reliance on supervised ID labels for tracking
Abstract
Unsupervised object-centric learning methods allow the partitioning of scenes into entities without additional localization information and are excellent candidates for reducing the annotation burden of multiple-object tracking (MOT) pipelines. Unfortunately, they lack two key properties: objects are often split into parts and are not consistently tracked over time. In fact, state-of-the-art models achieve pixel-level accuracy and temporal consistency by relying on supervised object detection with additional ID labels for the association through time. This paper proposes a video object-centric model for MOT. It consists of an index-merge module that adapts the object-centric slots into detection outputs and an object memory module that builds complete object prototypes to handle occlusions. Benefited from object-centric learning, we only require sparse detection labels (0%-6.25%) for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Object-Centric Multiple Object Tracking· youtube
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Advanced Image and Video Retrieval Techniques · Human Pose and Action Recognition
