Latent World Models for Automated Driving: A Unified Taxonomy, Evaluation Framework, and Open Challenges

Rongxiang Zeng; Yongqi Dong

arXiv:2603.09086·cs.RO·March 11, 2026

Latent World Models for Automated Driving: A Unified Taxonomy, Evaluation Framework, and Open Challenges

Rongxiang Zeng, Yongqi Dong

PDF

Open Access

TL;DR

This paper introduces a comprehensive framework for latent world models in automated driving, organizing design choices, proposing evaluation methods, and outlining future research directions to enhance robustness, generalization, and efficiency.

Contribution

It unifies recent advances into a taxonomy, defines evaluation protocols, and highlights open challenges for latent world models in automated driving.

Findings

01

Proposes a unifying taxonomy for latent world models.

02

Introduces evaluation metrics including a closed-loop metric suite.

03

Identifies key research directions for robustness and resource efficiency.

Abstract

Emerging generative world models and vision-language-action (VLA) systems are rapidly reshaping automated driving by enabling scalable simulation, long-horizon forecasting, and capability-rich decision making. Across these directions, latent representations serve as the central computational substrate: they compress high-dimensional multi-sensor observations, enable temporally coherent rollouts, and provide interfaces for planning, reasoning, and controllable generation. This paper proposes a unifying latent-space framework that synthesizes recent progress in world models for automated driving. The framework organizes the design space by the target and form of latent representations (latent worlds, latent actions, latent generators; continuous states, discrete tokens, and hybrids) and by structural priors for geometry, topology, and semantics. Building on this taxonomy, the paper…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAutonomous Vehicle Technology and Safety · Generative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications