Synchronize Feature Extracting and Matching: A Single Branch Framework for 3D Object Tracking
Teli Ma, Mengmeng Wang, Jimin Xiao, Huifeng Wu, Yong Liu

TL;DR
This paper introduces SyncTrack, a single-branch framework for 3D LiDAR object tracking that synchronizes feature extraction and matching, reducing complexity and improving performance using Transformer-based dynamic affinity and attentive point sampling.
Contribution
The novel SyncTrack framework eliminates the need for separate matching networks by synchronizing feature extraction and matching within a single Transformer-based architecture.
Findings
Achieves state-of-the-art real-time tracking performance on KITTI and NuScenes datasets.
Reduces computational complexity by avoiding duplicate encoder forwarding.
Enhances feature discrimination with attentive point sampling strategy.
Abstract
Siamese network has been a de facto benchmark framework for 3D LiDAR object tracking with a shared-parametric encoder extracting features from template and search region, respectively. This paradigm relies heavily on an additional matching network to model the cross-correlation/similarity of the template and search region. In this paper, we forsake the conventional Siamese paradigm and propose a novel single-branch framework, SyncTrack, synchronizing the feature extracting and matching to avoid forwarding encoder twice for template and search region as well as introducing extra parameters of matching network. The synchronization mechanism is based on the dynamic affinity of the Transformer, and an in-depth analysis of the relevance is provided theoretically. Moreover, based on the synchronization, we introduce a novel Attentive Points-Sampling strategy into the Transformer layers…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Face recognition and analysis · Human Pose and Action Recognition
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Dropout · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Adam · Layer Normalization · Dense Connections
