TL;DR
ReSim is a controllable, high-fidelity world simulation model for autonomous driving that incorporates diverse behaviors, including hazardous non-expert actions, to improve policy evaluation and planning.
Contribution
It introduces a novel diffusion transformer-based world model trained on heterogeneous data, including simulator-generated non-expert behaviors, with a Video2Reward module for reward estimation.
Findings
ReSim achieves up to 44% higher visual fidelity.
Controllability improves over 50% for various actions.
Planning performance on NAVSIM increases by 25%.
Abstract
How can we reliably simulate future driving scenarios under a wide range of ego driving behaviors? Recent driving world models, developed exclusively on real-world driving data composed mainly of safe expert trajectories, struggle to follow hazardous or non-expert behaviors, which are rare in such data. This limitation restricts their applicability to tasks such as policy evaluation. In this work, we address this challenge by enriching real-world human demonstrations with diverse non-expert data collected from a driving simulator (e.g., CARLA), and building a controllable world model trained on this heterogeneous corpus. Starting with a video generator featuring a diffusion transformer architecture, we devise several strategies to effectively integrate conditioning signals and improve prediction controllability and fidelity. The resulting model, ReSim, enables Reliable Simulation of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
