A Unified 3D Object Perception Framework for Real-Time Outside-In Multi-Camera Systems
Yizhou Wang, Sameer Pusegaonkar, Yuxing Wang, Anqi Li, Vishal Kumar, Chetan Sethi, Ganapathy Aiyer, Yun He, Kartikay Thakkar, Swapnil Rathi, Bhushan Rupde, Zheng Tang, Sujit Biswas

TL;DR
This paper introduces a real-time, large-scale 3D object perception framework for outside-in multi-camera systems, combining geometric priors, occlusion-aware ReID, and generative data augmentation to improve accuracy and efficiency.
Contribution
It presents an adapted Sparse4D framework with domain gap bridging, occlusion-aware ReID, and an optimized TensorRT plugin for real-time multi-camera 3D perception in infrastructure environments.
Findings
Achieved a state-of-the-art HOTA score of 45.22 on AI City Challenge 2025.
Developed a hardware-accelerated implementation with 2.15x speedup on modern GPUs.
Supported over 64 concurrent camera streams on a single GPU.
Abstract
Accurate 3D object perception and multi-target multi-camera (MTMC) tracking are fundamental for the digital transformation of industrial infrastructure. However, transitioning "inside-out" autonomous driving models to "outside-in" static camera networks presents significant challenges due to heterogeneous camera placements and extreme occlusion. In this paper, we present an adapted Sparse4D framework specifically optimized for large-scale infrastructure environments. Our system leverages absolute world-coordinate geometric priors and introduces an occlusion-aware ReID embedding module to maintain identity stability across distributed sensor networks. To bridge the Sim2Real domain gap without manual labeling, we employ a generative data augmentation strategy using the NVIDIA COSMOS framework, creating diverse environmental styles that enhance the model's appearance-invariance. Evaluated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Robotics and Sensor-Based Localization · Advanced Vision and Imaging
