DriveCamSim: Generalizable Camera Simulation via Explicit Camera Modeling for Autonomous Driving

Wenchao Sun; Xuewu Lin; Keyu Chen; Zixiang Pei; Yining Shi; Chuang Zhang; Sifa Zheng

arXiv:2505.19692·cs.CV·May 27, 2025

DriveCamSim: Generalizable Camera Simulation via Explicit Camera Modeling for Autonomous Driving

Wenchao Sun, Xuewu Lin, Keyu Chen, Zixiang Pei, Yining Shi, Chuang Zhang, Sifa Zheng

PDF

Open Access 1 Repo 3 Reviews

TL;DR

DriveCamSim introduces a generalizable camera simulation framework for autonomous driving that explicitly models camera parameters and maintains high visual quality, enabling flexible, controllable, and robust multi-view video generation.

Contribution

The paper proposes Explicit Camera Modeling (ECM) for decoupling camera configuration from the model, enhancing generalization and controllability in camera simulation for autonomous driving.

Findings

01

Superior visual quality and controllability demonstrated.

02

Effective generalization across camera parameters and frame rates.

03

Enhanced temporal consistency and identity-awareness.

Abstract

Camera sensor simulation serves as a critical role for autonomous driving (AD), e.g. evaluating vision-based AD algorithms. While existing approaches have leveraged generative models for controllable image/video generation, they remain constrained to generating multi-view video sequences with fixed camera viewpoints and video frequency, significantly limiting their downstream applications. To address this, we present a generalizable camera simulation framework DriveCamSim, whose core innovation lies in the proposed Explicit Camera Modeling (ECM) mechanism. Instead of implicit interaction through vanilla attention, ECM establishes explicit pixel-wise correspondences across multi-view and multi-frame dimensions, decoupling the model from overfitting to the specific camera configurations (intrinsic/extrinsic parameters, number of views) and temporal sampling rates presented in the training…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 6Confidence 4

Strengths

1. The proposed Explicit Camera Modeling (ECM) is a strong technical contribution that directly addresses a major limitation in prior work. Decoupling the model from fixed camera configurations is a well-motivated and crucial step toward creating truly flexible and practical AD simulators. And the qualitative result looks great. 2. Identity-aware embedding is inspiring, maintaining consistency for dynamic objects, which is a common failure point in generative models. 3. The qualitative and qua

Weaknesses

1. The proposed Explicit Camera Modeling enables the model to generalize to unseen parameters. Although there are qualitative experiments showing that the model outperforms previous methods in the generalization of camera configuration. It would be better if there are quantitative evaluations with modern feed-forward SFM models, showing that the generated image follows the desired camera parameters. 2. Ablation is an important part and should be included in the main paper. 3. The submission do

Reviewer 02Rating 4Confidence 4

Strengths

- A novel and compact explicit camera modeling mechanism is proposed. - Detailed visualization results are provided, offering valuable insights.

Weaknesses

- In Table 3, the perspective-based and attention-based control mechanisms are presented, but it is unclear which methods these mechanisms correspond to. - The novelty of the approach is not immediately apparent in the methods section, as it contains a lot of detailed explanations about handling different conditions.

Reviewer 03Rating 6Confidence 5

Strengths

1. The proposed ECM mechanism effectively decouples the model from specific camera parameters and temporal sampling rates by establishing explicit pixel-wise correspondences in 3D space, filling the gap of poor generalization in existing implicit modeling methods. 2. The information-preserving control mechanism, especially the identity-aware extension, successfully mitigates information loss in conditional encoding and injection, improving both controllability and foreground temporal consistency

Weaknesses

1. Using MagicDrive and DreamForge as baselines is insufficient, as they do not support camera parameter generalization. On the contrary, the paper should conduct a direct comparison with 3D-based generative works like MagicDrive3D[a] to show advantages. 2. No video-specific evaluation metrics are employed. Benchmarks like W-CODA2024[b], which are tailored for video generation quality and consistency, should be adopted to comprehensively assess temporal performance. 3. Key components in Figure 4

Code & Models

Repositories

swc-17/drivecamsim
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAutonomous Vehicle Technology and Safety · Transportation and Mobility Innovations

MethodsWhy is Venmo saying something went wrong? — Identify the Issue!