TL;DR
This paper enhances multi-object tracking in urban driving by integrating geometry, shape, and pose cues from monocular images, improving accuracy across various detectors and motion scenarios without relying on complex association frameworks.
Contribution
It introduces a novel set of pairwise costs based on 3D cues that are easy to implement, real-time computable, and compatible with any data association method.
Findings
Achieves consistent improvement over state-of-the-art methods.
Surpasses state-of-the-art with simple two-frame Hungarian assignment.
Demonstrates robustness across different detectors and motion conditions.
Abstract
This paper introduces geometry and object shape and pose costs for multi-object tracking in urban driving scenarios. Using images from a monocular camera alone, we devise pairwise costs for object tracks, based on several 3D cues such as object pose, shape, and motion. The proposed costs are agnostic to the data association method and can be incorporated into any optimization framework to output the pairwise data associations. These costs are easy to implement, can be computed in real-time, and complement each other to account for possible errors in a tracking-by-detection framework. We perform an extensive analysis of the designed costs and empirically demonstrate consistent improvement over the state-of-the-art under varying conditions that employ a range of object detectors, exhibit a variety in camera and object motions, and, more importantly, are not reliant on the choice of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
