CoIn3D: Revisiting Configuration-Invariant Multi-Camera 3D Object Detection

Zhaonian Kuang; Rui Ding; Haotian Wang; Xinhu Zheng; Meng Yang; Gang Hua

arXiv:2603.05042·cs.CV·March 27, 2026

CoIn3D: Revisiting Configuration-Invariant Multi-Camera 3D Object Detection

Zhaonian Kuang, Rui Ding, Haotian Wang, Xinhu Zheng, Meng Yang, Gang Hua

PDF

Open Access

TL;DR

This paper introduces CoIn3D, a novel framework for multi-camera 3D object detection that enhances generalization to unseen camera configurations by explicitly modeling spatial priors and employing configuration-aware data augmentation.

Contribution

CoIn3D is the first approach to explicitly incorporate spatial priors and dynamic view synthesis for improved cross-configuration generalization in MC3D.

Findings

01

Achieves strong cross-configuration performance on NuScenes, Waymo, and Lyft datasets.

02

Outperforms existing methods in unseen camera configuration scenarios.

03

Effective across multiple MC3D paradigms like BEVDepth, BEVFormer, and PETR.

Abstract

Multi-camera 3D object detection (MC3D) has attracted increasing attention with the growing deployment of multi-sensor physical agents, such as robots and autonomous vehicles. However, MC3D models still struggle to generalize to unseen platforms with new multi-camera configurations. Current solutions simply employ a meta-camera for unified representation but lack comprehensive consideration. In this paper, we revisit this issue and identify that the devil lies in spatial prior discrepancies across source and target configurations, including different intrinsics, extrinsics, and array layouts. To address this, we propose CoIn3D, a generalizable MC3D framework that enables strong transferability from source configurations to unseen target ones. CoIn3D explicitly incorporates all identified spatial priors into both feature embedding and image observation through spatial-aware feature…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Face recognition and analysis · Robotics and Sensor-Based Localization