Enhancing End-to-End Autonomous Driving with Latent World Model

Yingyan Li; Lue Fan; Jiawei He; Yuqi Wang; Yuntao Chen; Zhaoxiang; Zhang; Tieniu Tan

arXiv:2406.08481·cs.CV·March 3, 2025

Enhancing End-to-End Autonomous Driving with Latent World Model

Yingyan Li, Lue Fan, Jiawei He, Yuqi Wang, Yuntao Chen, Zhaoxiang, Zhang, Tieniu Tan

PDF

Open Access 1 Repo 1 Video 3 Reviews

TL;DR

This paper introduces LAW, a self-supervised latent world model that enhances scene feature learning for end-to-end autonomous driving, leading to improved trajectory prediction and state-of-the-art results on multiple benchmarks.

Contribution

The paper proposes a novel self-supervised learning approach using the LAtent World model (LAW) to improve scene feature representations in end-to-end driving systems.

Findings

01

LAW achieves state-of-the-art performance on nuScenes, NAVSIM, and CARLA benchmarks.

02

Self-supervised learning with LAW enhances trajectory prediction accuracy.

03

The approach is effective in both perception-free and perception-based frameworks.

Abstract

In autonomous driving, end-to-end planners directly utilize raw sensor data, enabling them to extract richer scene features and reduce information loss compared to traditional planners. This raises a crucial research question: how can we develop better scene feature representations to fully leverage sensor data in end-to-end driving? Self-supervised learning methods show great success in learning rich feature representations in NLP and computer vision. Inspired by this, we propose a novel self-supervised learning approach using the LAtent World model (LAW) for end-to-end driving. LAW predicts future scene features based on current features and ego trajectories. This self-supervised task can be seamlessly integrated into perception-free and perception-based frameworks, improving scene feature learning and optimizing trajectory prediction. LAW achieves state-of-the-art performance across…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 6Confidence 5

Strengths

1. Novel integration of world model concepts into end-to-end driving 2. Comprehensive experimental validation across multiple benchmarks. Demonstrates practical improvements in both closed and open-loop settings

Weaknesses

1. Limited discussion of computational overhead - no analysis of inference time or model size. Autonomous driving systems must make decisions in real-time, typically requiring processing speeds of at least 10-20 Hz (decisions every 50-100ms). Without inference time analysis, it's unclear if LAW is applicable for real deployment on edge computing devices. 2. No discussion of robustness to adverse weather/lighting conditions. As in Appendix A.1, the augmentation is claimed to enhance the robustnes

Reviewer 02Rating 8Confidence 4

Strengths

The proposed LAW framework utilized a self-supervised method to significantly reduce the need for heavy annotation tasks, addressing the data scalability challenge of many existing methods. The detailed breakdowns of ablation studies, latency analyses, and visualizations provide readers with clear and comprehensive information to understand and reproduce the work.

Weaknesses

The view selection strategy is a valuable insight to improve the efficiency of the method, but it adds complexity to the overall framework. Although there is only a minimal performance drop, it seems the view selection strategy hasn’t fully captured the informative scenes in driving scenarios. If there could be more discussion or analysis on what caused the performance drop, or how this issue could be mitigated with the Latent World Model, it would make the work more complete.

Reviewer 03Rating 8Confidence 5

Strengths

1. Introduction of the Latent World Model (LAW) to predict future scene latents from current scene latents and ego trajectories. 2. Demonstrated universality across various common autonomous driving paradigms, i.e., perception-free and perception-based frameworks. 3. Extensive experiments conducted on multiple benchmarks, achieving state-of-the-art performance on real-world open-loop datasets like nuScenes and simulator-based closed-loop CARLA benchmark.

Weaknesses

See the Questions section.

Code & Models

Repositories

bravegroup/law
pytorchOfficial

Videos

Enhancing End-to-End Autonomous Driving with Latent World Model· slideslive

Taxonomy

TopicsAutonomous Vehicle Technology and Safety · Advanced Neural Network Applications