Multiple Object Tracking as ID Prediction
Ruopeng Gao, Ji Qi, Limin Wang

TL;DR
This paper introduces MOTIP, a novel end-to-end trainable approach that treats multi-object tracking as an ID prediction task, achieving state-of-the-art results without complex heuristics.
Contribution
It presents MOTIP, a simple, effective method that transforms object association into an ID prediction problem, enabling end-to-end learning and surpassing traditional heuristic-based methods.
Findings
Achieves state-of-the-art results on multiple benchmarks.
Operates effectively using only object-level features.
Eliminates need for handcrafted association heuristics.
Abstract
Multi-Object Tracking (MOT) has been a long-standing challenge in video understanding. A natural and intuitive approach is to split this task into two parts: object detection and association. Most mainstream methods employ meticulously crafted heuristic techniques to maintain trajectory information and compute cost matrices for object matching. Although these methods can achieve notable tracking performance, they often require a series of elaborate handcrafted modifications while facing complicated scenarios. We believe that manually assumed priors limit the method's adaptability and flexibility in learning optimal tracking capabilities from domain-specific data. Therefore, we introduce a new perspective that treats Multiple Object Tracking as an in-context ID Prediction task, transforming the aforementioned object association into an end-to-end trainable task. Based on this, we propose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpam and Phishing Detection · User Authentication and Security Systems · Authorship Attribution and Profiling
