3D Multi-Object Tracking with Semi-Supervised GRU-Kalman Filter
Xiaoxiang Wang, Jiaxin Liu, Miaojie Feng, Zhaoxing Zhang, Xin Yang

TL;DR
This paper introduces a semi-supervised GRU-Kalman filter approach for 3D multi-object tracking that learns complex motion patterns directly from data, improving accuracy over traditional linear models.
Contribution
The novel integration of a learnable Kalman filter with a semi-supervised training strategy enables data-driven motion modeling in 3D MOT, surpassing existing methods.
Findings
Outperforms traditional tracking-by-detection methods on nuScenes and Argoverse2 datasets.
Learns complex, nonlinear object motion without manual model design.
Improves robustness and convergence speed through semi-supervised learning.
Abstract
3D Multi-Object Tracking (MOT), a fundamental component of environmental perception, is essential for intelligent systems like autonomous driving and robotic sensing. Although Tracking-by-Detection frameworks have demonstrated excellent performance in recent years, their application in real-world scenarios faces significant challenges. Object movement in complex environments is often highly nonlinear, while existing methods typically rely on linear approximations of motion. Furthermore, system noise is frequently modeled as a Gaussian distribution, which fails to capture the true complexity of the noise dynamics. These oversimplified modeling assumptions can lead to significant reductions in tracking precision. To address this, we propose a GRU-based MOT method, which introduces a learnable Kalman filter into the motion module. This approach is able to learn object motion…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Infrared Target Detection Methodologies · Robotics and Sensor-Based Localization
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
