MIC-BEV: Multi-Infrastructure Camera Bird's-Eye-View Transformer with Relation-Aware Fusion for 3D Object Detection
Yun Zhang, Zhaoliang Zheng, Johnson Liu, Zhiyu Huang, Zewei Zhou, Zonglin Meng, Tianhui Cai, and Jiaqi Ma

TL;DR
MIC-BEV is a novel Transformer-based framework for infrastructure-based 3D object detection that supports multiple heterogeneous cameras and is robust under sensor degradation, achieving state-of-the-art results.
Contribution
Introducing MIC-BEV, a flexible, relation-aware fusion model for multi-camera infrastructure perception, and M2I, a synthetic dataset for training and evaluation.
Findings
MIC-BEV outperforms existing methods on M2I and RoScenes datasets.
MIC-BEV maintains robustness under weather and sensor challenges.
The M2I dataset provides diverse scenarios for infrastructure-based detection.
Abstract
Infrastructure-based perception plays a crucial role in intelligent transportation systems, offering global situational awareness and enabling cooperative autonomy. However, existing camera-based detection models often underperform in such scenarios due to challenges such as multi-view infrastructure setup, diverse camera configurations, degraded visual inputs, and various road layouts. We introduce MIC-BEV, a Transformer-based bird's-eye-view (BEV) perception framework for infrastructure-based multi-camera 3D object detection. MIC-BEV flexibly supports a variable number of cameras with heterogeneous intrinsic and extrinsic parameters and demonstrates strong robustness under sensor degradation. The proposed graph-enhanced fusion module in MIC-BEV integrates multi-view image features into the BEV space by exploiting geometric relationships between cameras and BEV cells alongside latent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
