A Lightweight and Detector-free 3D Single Object Tracker on Point Clouds
Yan Xia, Qiangqiang Wu, Wei Li, Antoni B. Chan, Uwe Stilla

TL;DR
This paper introduces DMT, a lightweight, detector-free 3D single object tracker that leverages motion cues for faster and more accurate tracking in point clouds without relying on complex 3D detectors.
Contribution
The paper proposes a novel detector-free 3D tracking network that uses motion prediction and explicit voting, improving speed and accuracy over existing methods.
Findings
Achieves ~10% better performance on NuScenes dataset.
Runs at 72 FPS, faster than previous approaches.
Does not require complex 3D detectors.
Abstract
Recent works on 3D single object tracking treat the task as a target-specific 3D detection task, where an off-the-shelf 3D detector is commonly employed for the tracking. However, it is non-trivial to perform accurate target-specific detection since the point cloud of objects in raw LiDAR scans is usually sparse and incomplete. In this paper, we address this issue by explicitly leveraging temporal motion cues and propose DMT, a Detector-free Motion-prediction-based 3D Tracking network that completely removes the usage of complicated 3D detectors and is lighter, faster, and more accurate than previous trackers. Specifically, the motion prediction module is first introduced to estimate a potential target center of the current frame in a point-cloud-free manner. Then, an explicit voting module is proposed to directly regress the 3D box from the estimated target center. Extensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Human Pose and Action Recognition · Advanced Neural Network Applications
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
