MCBLT: Multi-Camera Multi-Object 3D Tracking in Long Videos
Yizhou Wang, Tim Meinhardt, Orcun Cetintas, Cheng-Yen Yang, Sameer, Satish Pusegaonkar, Benjamin Missaoui, Sujit Biswas, Zheng Tang, Laura, Leal-Taix\'e

TL;DR
This paper introduces MCBLT, a novel 3D multi-camera multi-object tracking framework that aggregates multi-view images for accurate 3D detection and employs hierarchical GNNs for robust long-term tracking, achieving state-of-the-art results.
Contribution
The paper presents a new 3D detection and tracking framework that effectively integrates multi-view images and hierarchical GNNs, improving long-term association and generalizability.
Findings
Achieved 81.22 HOTA on AICity'24 dataset.
Achieved 95.6 IDF1 on WildTrack dataset.
Demonstrated superior long-term tracking performance.
Abstract
Object perception from multi-view cameras is crucial for intelligent systems, particularly in indoor environments, e.g., warehouses, retail stores, and hospitals. Most traditional multi-target multi-camera (MTMC) detection and tracking methods rely on 2D object detection, single-view multi-object tracking (MOT), and cross-view re-identification (ReID) techniques, without properly handling important 3D information by multi-view image aggregation. In this paper, we propose a 3D object detection and tracking framework, named MCBLT, which first aggregates multi-view images with necessary camera calibration parameters to obtain 3D object detections in bird's-eye view (BEV). Then, we introduce hierarchical graph neural networks (GNNs) to track these 3D detections in BEV for MTMC tracking results. Unlike existing methods, MCBLT has impressive generalizability across different scenes and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Processing Techniques and Applications · Advanced Image and Video Retrieval Techniques
