Exploiting More Information in Sparse Point Cloud for 3D Single Object Tracking
Yubo Cui, Jiayao Shan, Zuoxu Gu, Zhiheng Li, Zheng Fang

TL;DR
This paper introduces a transformer-based framework that converts sparse 3D point clouds into dense representations and uses attention mechanisms to improve 3D object tracking, especially in extreme sparse scenarios.
Contribution
It proposes a sparse-to-dense transformation combined with attention-based encoding for enhanced 3D tracking in sparse point clouds, addressing limitations of previous methods.
Findings
Achieves promising results on KITTI and NuScenes datasets.
Improves tracking performance in extreme sparse scenarios.
Utilizes multi-scale attention to compensate for information loss.
Abstract
3D single object tracking is a key task in 3D computer vision. However, the sparsity of point clouds makes it difficult to compute the similarity and locate the object, posing big challenges to the 3D tracker. Previous works tried to solve the problem and improved the tracking performance in some common scenarios, but they usually failed in some extreme sparse scenarios, such as for tracking objects at long distances or partially occluded. To address the above problems, in this letter, we propose a sparse-to-dense and transformer-based framework for 3D single object tracking. First, we transform the 3D sparse points into 3D pillars and then compress them into 2D BEV features to have a dense representation. Then, we propose an attention-based encoder to achieve global similarity computation between template and search branches, which could alleviate the influence of sparsity. Meanwhile,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Face recognition and analysis · Human Pose and Action Recognition
