SynCL: A Synergistic Training Strategy with Instance-Aware Contrastive Learning for End-to-End Multi-Camera 3D Tracking
Shubo Lin, Yutong Kou, Zirui Wu, Shaoru Wang, Bing Li, Weiming Hu, Jin Gao

TL;DR
SynCL introduces a synergistic training strategy with instance-aware contrastive learning for end-to-end multi-camera 3D tracking, overcoming optimization challenges and achieving state-of-the-art results without extra inference costs.
Contribution
It proposes a novel training framework that combines hybrid matching, dynamic query filtering, and contrastive learning to improve multi-camera 3D tracking performance.
Findings
Achieves 58.9% AMOTA on nuScenes dataset.
Improves detection and tracking accuracy without additional inference costs.
Outperforms existing query-based 3D visual trackers.
Abstract
While existing query-based 3D end-to-end visual trackers integrate detection and tracking via the tracking-by-attention paradigm, these two chicken-and-egg tasks encounter optimization difficulties when sharing the same parameters. Our findings reveal that these difficulties arise due to two inherent constraints on the self-attention mechanism, i.e., over-deduplication for object queries and self-centric attention for track queries. In contrast, removing the self-attention mechanism not only minimally impacts regression predictions of the tracker, but also tends to generate more latent candidate boxes. Based on these analyses, we present SynCL, a novel plug-and-play synergistic training strategy designed to co-facilitate multi-task learning for detection and tracking. Specifically, we propose a Task-specific Hybrid Matching module for a weight-shared cross-attention-based decoder that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Vision and Imaging · Video Surveillance and Tracking Methods · Advanced Image and Video Retrieval Techniques
MethodsSoftmax · Attention Is All You Need · Contrastive Learning · ALIGN · ADaptive gradient method with the OPTimal convergence rate
