3D Multi-Object Tracking: A Baseline and New Evaluation Metrics
Xinshuo Weng, Jianren Wang, David Held, Kris Kitani

TL;DR
This paper introduces a simple, real-time 3D multi-object tracking system using classical methods, along with new evaluation metrics and tools, achieving state-of-the-art performance and high speed on benchmarks.
Contribution
The work presents a lightweight 3D MOT system with new evaluation metrics and tools, enabling fair comparison and demonstrating competitive results without relying on 2D data.
Findings
Achieves state-of-the-art performance on KITTI and nuScenes benchmarks.
Runs at 207.4 FPS, the fastest among modern MOT systems.
Provides new evaluation metrics and an open-source toolkit for 3D MOT assessment.
Abstract
3D multi-object tracking (MOT) is an essential component for many applications such as autonomous driving and assistive robotics. Recent work on 3D MOT focuses on developing accurate systems giving less attention to practical considerations such as computational cost and system complexity. In contrast, this work proposes a simple real-time 3D MOT system. Our system first obtains 3D detections from a LiDAR point cloud. Then, a straightforward combination of a 3D Kalman filter and the Hungarian algorithm is used for state estimation and data association. Additionally, 3D MOT datasets such as KITTI evaluate MOT methods in the 2D space and standardized 3D MOT evaluation tools are missing for a fair comparison of 3D MOT methods. Therefore, we propose a new 3D MOT evaluation tool along with three new metrics to comprehensively evaluate 3D MOT methods. We show that, although our system employs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Autonomous Vehicle Technology and Safety · Advanced Neural Network Applications
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
