Latent Chain-of-Thought World Modeling for End-to-End Driving

Shuhan Tan; Kashyap Chitta; Yuxiao Chen; Ran Tian; Yurong You; Yan Wang; Wenjie Luo; Yulong Cao; Philipp Krahenbuhl; Marco Pavone; Boris Ivanovic

arXiv:2512.10226·cs.CV·April 15, 2026

Latent Chain-of-Thought World Modeling for End-to-End Driving

Shuhan Tan, Kashyap Chitta, Yuxiao Chen, Ran Tian, Yurong You, Yan Wang, Wenjie Luo, Yulong Cao, Philipp Krahenbuhl, Marco Pavone, Boris Ivanovic

PDF

TL;DR

This paper introduces LCDrive, a latent chain-of-thought model for autonomous driving that improves reasoning and decision-making by using a learned latent space instead of natural language, leading to better performance.

Contribution

The work presents a novel latent CoT reasoning framework for driving that unifies decision making and reasoning in an action-aligned latent space, enhancing inference speed and trajectory quality.

Findings

01

LCDrive achieves faster inference than baselines.

02

It produces higher quality driving trajectories.

03

Reinforcement learning further improves reasoning capabilities.

Abstract

Recent Vision-Language-Action (VLA) models for autonomous driving explore inference-time reasoning as a way to improve driving performance and safety in challenging scenarios. Most prior work uses natural language to express chain-of-thought (CoT) reasoning before producing driving actions. However, text may not be the most efficient representation for reasoning. In this work, we present Latent-CoT-Drive (LCDrive): a model that expresses CoT in a latent language that captures possible outcomes of the driving actions being considered. Our approach unifies CoT reasoning and decision making by representing both in an action-aligned latent space. Instead of natural language, the model reasons by interleaving (1) action-proposal tokens, which use the same vocabulary as the model's output actions; and (2) world model tokens, which are grounded in a learned latent world model and express…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.