SceneCrafter: Controllable Multi-View Driving Scene Editing

Zehao Zhu; Yuliang Zou; Chiyu Max Jiang; Bo Sun; Vincent Casser; Xiukun Huang; Jiahao Wang; Zhenpei Yang; Ruiqi Gao; Leonidas Guibas; Mingxing Tan; Dragomir Anguelov

arXiv:2506.19488·cs.CV·June 25, 2025

SceneCrafter: Controllable Multi-View Driving Scene Editing

Zehao Zhu, Yuliang Zou, Chiyu Max Jiang, Bo Sun, Vincent Casser, Xiukun Huang, Jiahao Wang, Zhenpei Yang, Ruiqi Gao, Leonidas Guibas, Mingxing Tan, Dragomir Anguelov

PDF

Open Access

TL;DR

SceneCrafter is a versatile multi-view editing framework that enables realistic, controllable, and 3D-consistent modifications of driving scenes from real data, enhancing simulation fidelity for autonomous vehicle development.

Contribution

The paper introduces SceneCrafter, a novel multi-view diffusion-based editing model that addresses key challenges in driving scene editing, including 3D consistency and multi-modality control, with new data generation techniques.

Findings

01

Achieves state-of-the-art realism and controllability in scene editing.

02

Demonstrates effective multi-modality manipulation including weather and time.

03

Ensures 3D consistency across multiple camera views.

Abstract

Simulation is crucial for developing and evaluating autonomous vehicle (AV) systems. Recent literature builds on a new generation of generative models to synthesize highly realistic images for full-stack simulation. However, purely synthetically generated scenes are not grounded in reality and have difficulty in inspiring confidence in the relevance of its outcomes. Editing models, on the other hand, leverage source scenes from real driving logs, and enable the simulation of different traffic layouts, behaviors, and operating conditions such as weather and time of day. While image editing is an established topic in computer vision, it presents fresh sets of challenges in driving simulation: (1) the need for cross-camera 3D consistency, (2) learning ``empty street" priors from driving data with foreground occlusions, and (3) obtaining paired image tuples of varied editing conditions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsComputer Graphics and Visualization Techniques · 3D Modeling in Geospatial Applications · Remote Sensing and LiDAR Applications