Beyond Simulation: Benchmarking World Models for Planning and Causality in Autonomous Driving
Hunter Schofield, Mohammed Elmahgiubi, Kasra Rezaee, and Jinjun Shan

TL;DR
This paper evaluates the robustness of world models used as traffic simulators in autonomous driving, revealing limitations of existing metrics and proposing new ones to better assess their reliability for policy training.
Contribution
It introduces new metrics to evaluate world models' sensitivity to uncontrollable objects and extends existing benchmarks to better assess their suitability as pseudo-environments.
Findings
Existing metrics may not fully capture world model robustness.
Many top models fail under ego-replay scenarios.
Proposed metrics better identify models' sensitivity to uncontrollable factors.
Abstract
World models have become increasingly popular in acting as learned traffic simulators. Recent work has explored replacing traditional traffic simulators with world models for policy training. In this work, we explore the robustness of existing metrics to evaluate world models as traffic simulators to see if the same metrics are suitable for evaluating a world model as a pseudo-environment for policy training. Specifically, we analyze the metametric employed by the Waymo Open Sim-Agents Challenge (WOSAC) and compare world model predictions on standard scenarios where the agents are fully or partially controlled by the world model (partial replay). Furthermore, since we are interested in evaluating the ego action-conditioned world model, we extend the standard WOSAC evaluation domain to include agents that are causal to the ego vehicle. Our evaluations reveal a significant number of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAutonomous Vehicle Technology and Safety · Traffic control and management · Transportation and Mobility Innovations
