DriveCtrl: Conditioned Sim-to-Real Driving Video Generation
Haonan Zhao, Yiting Wang, Jingkun Chen, Valentina Donzella, Thomas Bashford-Rogers, Kurt Debattista

TL;DR
DriveCtrl is a depth-conditioned framework that generates realistic, temporally coherent driving videos from simulation, matching real-world style and preserving annotations for autonomous driving applications.
Contribution
It introduces a structure-aware adapter for depth-guided, controllable video synthesis and a scalable pipeline for transforming simulator videos into realistic driving footage.
Findings
DriveCtrl outperforms baseline models in realism and temporal consistency.
The generated videos improve perception task performance.
The proposed DVRS metric effectively assesses driving video realism.
Abstract
Large-scale labelled driving video data is essential for training autonomous driving systems. Although simulation offers scalable and fully annotated data, the domain gap between synthetic and real-world driving videos significantly limits its utility for downstream deployment. Existing video generation methods are not well-suited for this task, as they fail to simultaneously preserve scene structure, object dynamics, temporal consistency, and visual realism, all of which are critical for maintaining annotation validity in generated data. In this paper, we present DriveCtrl, a depth-conditioned controllable sim-to-real video generation framework for realistic driving video synthesis. Built upon a pretrained video foundation model, DriveCtrl introduces a structure-aware adapter that enables depth-guided generation while preserving the scene layout and motion patterns of the source…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
