UniDrive: Towards Universal Driving Perception Across Camera Configurations
Ye Li, Wenzhao Zheng, Xiaonan Huang, Kurt Keutzer

TL;DR
UniDrive introduces a unified virtual camera framework and optimization method to enable autonomous driving perception models to generalize across different camera configurations, improving adaptability and robustness.
Contribution
The paper proposes a novel virtual camera projection and configuration optimization approach that can be integrated into existing perception models for universal applicability.
Findings
Models trained on one camera configuration generalize well to others.
The virtual camera approach reduces performance degradation across configurations.
Experimental results on CARLA dataset validate the effectiveness of UniDrive.
Abstract
Vision-centric autonomous driving has demonstrated excellent performance with economical sensors. As the fundamental step, 3D perception aims to infer 3D information from 2D images based on 3D-2D projection. This makes driving perception models susceptible to sensor configuration (e.g., camera intrinsics and extrinsics) variations. However, generalizing across camera configurations is important for deploying autonomous driving models on different car models. In this paper, we present UniDrive, a novel framework for vision-centric autonomous driving to achieve universal perception across camera configurations. We deploy a set of unified virtual cameras and propose a ground-aware projection method to effectively transform the original images into these unified virtual views. We further propose a virtual configuration optimization method by minimizing the expected projection error between…
Peer Reviews
Decision·ICLR 2025 Poster
- This paper addresses a valuable real-world demand. As autonomous driving sensor configurations (e.g., camera intrinsics, extrinsics, and the number of cameras) evolve, it becomes crucial to ensure the perception algorithms are robust to such changes. - The authors propose a re-mapping approach that transforms pixels from the original view into a unified virtual view, enabling training and inference to be conducted on this virtual view. This approach enhances robustness to cross-configuration s
- As the authors noted, one main limitation is that all analyses were conducted in a simulated environment. While this controlled setting is useful for evaluating the method’s potential, it raises concerns about real-world applicability, as training-based detectors are often sensitive to dataset-specific factors. Given the paper’s real-world motivation, validation on actual datasets would be valuable. - As I posed in Strengths, it is a little counterintuitive that the transformation from the ori
The paper addresses a valuable and relevant problem in autonomous driving, focusing on camera configuration generalization. The proposed approach sounds generally reasonable, employing a unified virtual camera space and ground-aware projection to help manage variability in camera setups. Experimental results show clear performance gains, with UniDrive maintaining high perception accuracy across diverse camera configurations and outperforming baseline models.
W1: The innovation is somewhat limited, as the framework mainly leverages established virtual projection and optimization techniques without presenting substantially novel concepts. W2: The approach relies heavily on simulated environments (e.g., CARLA), raising concerns about its applicability and robustness in real-world conditions.
1. The paper proposes to transform images into a unified virtual camera space, improving robustness across various camera configurations. 2. Additionally, the paper proposes a virtual configuration optimization strategy that minimizes projection errors. 3. The paper also presents a systematic data generation platform and a benchmark for evaluating perception models under different camera configurations.
See the Questions section.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Advanced Vision and Imaging · Advanced Neural Network Applications
MethodsEntropy Regularization · Proximal Policy Optimization · CARLA: An Open Urban Driving Simulator · Sparse Evolutionary Training
