CoIn3D: Revisiting Configuration-Invariant Multi-Camera 3D Object Detection
Zhaonian Kuang, Rui Ding, Haotian Wang, Xinhu Zheng, Meng Yang, Gang Hua

TL;DR
This paper introduces CoIn3D, a novel framework for multi-camera 3D object detection that enhances generalization to unseen camera configurations by explicitly modeling spatial priors and employing configuration-aware data augmentation.
Contribution
CoIn3D is the first approach to explicitly incorporate spatial priors and dynamic view synthesis for improved cross-configuration generalization in MC3D.
Findings
Achieves strong cross-configuration performance on NuScenes, Waymo, and Lyft datasets.
Outperforms existing methods in unseen camera configuration scenarios.
Effective across multiple MC3D paradigms like BEVDepth, BEVFormer, and PETR.
Abstract
Multi-camera 3D object detection (MC3D) has attracted increasing attention with the growing deployment of multi-sensor physical agents, such as robots and autonomous vehicles. However, MC3D models still struggle to generalize to unseen platforms with new multi-camera configurations. Current solutions simply employ a meta-camera for unified representation but lack comprehensive consideration. In this paper, we revisit this issue and identify that the devil lies in spatial prior discrepancies across source and target configurations, including different intrinsics, extrinsics, and array layouts. To address this, we propose CoIn3D, a generalizable MC3D framework that enables strong transferability from source configurations to unseen target ones. CoIn3D explicitly incorporates all identified spatial priors into both feature embedding and image observation through spatial-aware feature…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Face recognition and analysis · Robotics and Sensor-Based Localization
