SynCL: A Synergistic Training Strategy with Instance-Aware Contrastive Learning for End-to-End Multi-Camera 3D Tracking

Shubo Lin; Yutong Kou; Zirui Wu; Shaoru Wang; Bing Li; Weiming Hu; Jin Gao

arXiv:2411.06780·cs.CV·May 19, 2025

SynCL: A Synergistic Training Strategy with Instance-Aware Contrastive Learning for End-to-End Multi-Camera 3D Tracking

Shubo Lin, Yutong Kou, Zirui Wu, Shaoru Wang, Bing Li, Weiming Hu, Jin Gao

PDF

Open Access 1 Video

TL;DR

SynCL introduces a synergistic training strategy with instance-aware contrastive learning for end-to-end multi-camera 3D tracking, overcoming optimization challenges and achieving state-of-the-art results without extra inference costs.

Contribution

It proposes a novel training framework that combines hybrid matching, dynamic query filtering, and contrastive learning to improve multi-camera 3D tracking performance.

Findings

01

Achieves 58.9% AMOTA on nuScenes dataset.

02

Improves detection and tracking accuracy without additional inference costs.

03

Outperforms existing query-based 3D visual trackers.

Abstract

While existing query-based 3D end-to-end visual trackers integrate detection and tracking via the tracking-by-attention paradigm, these two chicken-and-egg tasks encounter optimization difficulties when sharing the same parameters. Our findings reveal that these difficulties arise due to two inherent constraints on the self-attention mechanism, i.e., over-deduplication for object queries and self-centric attention for track queries. In contrast, removing the self-attention mechanism not only minimally impacts regression predictions of the tracker, but also tends to generate more latent candidate boxes. Based on these analyses, we present SynCL, a novel plug-and-play synergistic training strategy designed to co-facilitate multi-task learning for detection and tracking. Specifically, we propose a Task-specific Hybrid Matching module for a weight-shared cross-attention-based decoder that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

SynCL: A Synergistic Training Strategy with Instance-Aware Contrastive Learning for End-to-End Multi-Camera 3D Tracking· slideslive

Taxonomy

TopicsAdvanced Vision and Imaging · Video Surveillance and Tracking Methods · Advanced Image and Video Retrieval Techniques

MethodsSoftmax · Attention Is All You Need · Contrastive Learning · ALIGN · ADaptive gradient method with the OPTimal convergence rate