FocusTrack: One-Stage Focus-and-Suppress Framework for 3D Point Cloud Object Tracking
Sifan Zhou, Jiahao Nie, Ziyu Zhao, Yichao Cao, Xiaobo Lu

TL;DR
FocusTrack introduces a one-stage 3D point cloud tracking framework that unifies motion and semantics modeling, achieving state-of-the-art accuracy and high speed without explicit segmentation.
Contribution
It proposes a novel one-stage framework with IMM and Focus-and-Suppress attention, overcoming limitations of two-stage methods in 3D point cloud tracking.
Findings
Achieves new SOTA performance on KITTI, nuScenes, and Waymo datasets.
Runs at 105 FPS, demonstrating high efficiency.
Effectively suppresses background noise and enhances foreground semantics.
Abstract
In 3D point cloud object tracking, the motion-centric methods have emerged as a promising avenue due to its superior performance in modeling inter-frame motion. However, existing two-stage motion-based approaches suffer from fundamental limitations: (1) error accumulation due to decoupled optimization caused by explicit foreground segmentation prior to motion estimation, and (2) computational bottlenecks from sequential processing. To address these challenges, we propose FocusTrack, a novel one-stage paradigms tracking framework that unifies motion-semantics co-modeling through two core innovations: Inter-frame Motion Modeling (IMM) and Focus-and-Suppress Attention. The IMM module employs a temp-oral-difference siamese encoder to capture global motion patterns between adjacent frames. The Focus-and-Suppress attention that enhance the foreground semantics via motion-salient feature…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Human Pose and Action Recognition · 3D Shape Modeling and Analysis
