RegTrack: Simplicity Beneath Complexity in Robust Multi-Modal 3D Multi-Object Tracking
Lipeng Gu, Xuefeng Yan, Song Wang, Mingqiang Wei

TL;DR
RegTrack introduces a simple yet effective multi-modal 3D multi-object tracking method that leverages a unified tri-cue encoder inspired by gauge theory, achieving superior robustness and efficiency without complex association metrics.
Contribution
It proposes RegTrack, a novel 3D MOT approach that uses a tri-cue encoder and pairwise similarity-based association, challenging the need for complex, class-specific priors.
Findings
Outperforms 35 competitors on KITTI and nuScenes datasets.
Uses only 2.6 million parameters, demonstrating efficiency.
Achieves robust and generalizable tracking with point cloud inputs.
Abstract
Existing 3D multi-object tracking (MOT) methods often sacrifice efficiency and generalizability for robustness, largely relying on complex association metrics derived from multi-modal architectures and class-specific motion priors. Challenging the rooted belief that greater complexity necessarily yields greater robustness, we propose a robust, efficient, and generalizable method for multi-modal 3D MOT, dubbed RegTrack. Inspired by Yang-Mills gauge theory, RegTrack is built upon a unified tri-cue encoder (UTEnc), comprising three tightly coupled components: a local-global point cloud encoder (LG-PEnc), a mixture-of-experts-based geometry encoder (MoE-GEnc), and an image encoder from a well-pretrained visual-language model. LG-PEnc efficiently encodes the spatial and structural information of point clouds to produce foundational representations for each object, whose pairwise similarities…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Human Pose and Action Recognition · Robotics and Sensor-Based Localization
MethodsContrastive Language-Image Pre-training
