STTracker: Spatio-Temporal Tracker for 3D Single Object Tracking
Yubo Cui, Zhiheng Li, Zheng Fang

TL;DR
STTracker introduces a multi-frame spatio-temporal approach with patch-level feature encoding and sparse attention to improve 3D single object tracking performance on large-scale benchmarks.
Contribution
It leverages multi-frame point clouds and patch-level sparse attention to encode temporal information, enhancing tracking accuracy over previous methods.
Findings
Achieves 62.6% success in KITTI benchmark.
Attains 49.66% success in NuScenes benchmark.
Outperforms existing methods on large-scale datasets.
Abstract
3D single object tracking with point clouds is a critical task in 3D computer vision. Previous methods usually input the last two frames and use the predicted box to get the template point cloud in previous frame and the search area point cloud in the current frame respectively, then use similarity-based or motion-based methods to predict the current box. Although these methods achieved good tracking performance, they ignore the historical information of the target, which is important for tracking. In this paper, compared to inputting two frames of point clouds, we input multi-frame of point clouds to encode the spatio-temporal information of the target and learn the motion information of the target implicitly, which could build the correlations among different frames to track the target in the current frame efficiently. Meanwhile, rather than directly using the point feature for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Infrared Target Detection Methodologies · Impact of Light on Environment and Health
