3D Multi-Object Tracking with Semi-Supervised GRU-Kalman Filter

Xiaoxiang Wang; Jiaxin Liu; Miaojie Feng; Zhaoxing Zhang; Xin Yang

arXiv:2411.08433·cs.RO·November 14, 2024

3D Multi-Object Tracking with Semi-Supervised GRU-Kalman Filter

Xiaoxiang Wang, Jiaxin Liu, Miaojie Feng, Zhaoxing Zhang, Xin Yang

PDF

Open Access

TL;DR

This paper introduces a semi-supervised GRU-Kalman filter approach for 3D multi-object tracking that learns complex motion patterns directly from data, improving accuracy over traditional linear models.

Contribution

The novel integration of a learnable Kalman filter with a semi-supervised training strategy enables data-driven motion modeling in 3D MOT, surpassing existing methods.

Findings

01

Outperforms traditional tracking-by-detection methods on nuScenes and Argoverse2 datasets.

02

Learns complex, nonlinear object motion without manual model design.

03

Improves robustness and convergence speed through semi-supervised learning.

Abstract

3D Multi-Object Tracking (MOT), a fundamental component of environmental perception, is essential for intelligent systems like autonomous driving and robotic sensing. Although Tracking-by-Detection frameworks have demonstrated excellent performance in recent years, their application in real-world scenarios faces significant challenges. Object movement in complex environments is often highly nonlinear, while existing methods typically rely on linear approximations of motion. Furthermore, system noise is frequently modeled as a Gaussian distribution, which fails to capture the true complexity of the noise dynamics. These oversimplified modeling assumptions can lead to significant reductions in tracking precision. To address this, we propose a GRU-based MOT method, which introduces a learnable Kalman filter into the motion module. This approach is able to learn object motion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Infrared Target Detection Methodologies · Robotics and Sensor-Based Localization

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings