AdaWM: Adaptive World Model based Planning for Autonomous Driving
Hang Wang, Xin Ye, Feng Tao, Chenbin Pan, Abhirup Mallik, Burhaneddin, Yaman, Liu Ren, Junshan Zhang

TL;DR
AdaWM enhances autonomous driving by adaptively fine-tuning world models and policies, effectively addressing distribution shifts to improve robustness and efficiency in complex driving scenarios.
Contribution
This paper introduces AdaWM, a novel adaptive planning method that identifies and mitigates model and policy mismatches during finetuning in autonomous driving.
Findings
AdaWM significantly improves finetuning efficiency.
It enhances robustness in CARLA driving tasks.
The method reduces performance degradation during adaptation.
Abstract
World model based reinforcement learning (RL) has emerged as a promising approach for autonomous driving, which learns a latent dynamics model and uses it to train a planning policy. To speed up the learning process, the pretrain-finetune paradigm is often used, where online RL is initialized by a pretrained model and a policy learned offline. However, naively performing such initialization in RL may result in dramatic performance degradation during the online interactions in the new task. To tackle this challenge, we first analyze the performance degradation and identify two primary root causes therein: the mismatch of the planning policy and the mismatch of the dynamics model, due to distribution shift. We further analyze the effects of these factors on performance degradation during finetuning, and our findings reveal that the choice of finetuning strategies plays a pivotal role in…
Peer Reviews
Decision·ICLR 2025 Poster
1. A motivating point-of-entry for analyzing distributional shift during adaptation in model-based planning for autonomous driving. Quantified by the distributional gap of dynamic model or planning policy, an adaptive finetuning framework (AdaWM) is proposed that selectively fine-tunes world model or planning policy given switch-based dominated mismatch assessment. 2. Analytical characterization and assessment of the performance gap for world model and planning policy. 3. Improved adaptation re
1. Unverified model configurations and experimental settings: * The detailed setup for both the world model (dynamic) and the pretrained planning policy in AdaWM is unclear. * The paper lacks comprehensive configurations and descriptions of the experimental scenarios in CARLA. * The metrics selected are not entirely objective, as key benchmarks such as Driving Score and Route Completion are missing. Additionally, using a self-defined reward cannot ensure objective measurement of performance
- This paper makes it easy for readers to grasp the main idea. - This paper includes theoretical analysis.
- The author needs to provide visualizations of the model's prediction quality and trajectory planning quality to further demonstrate the model's performance. - The author should compare more model-based/world model planning methods in autonomous driving. Because UniAD and VAD are end-to-end methods, and Dreamer is not specifically designed for autonomous driving setting. - The method requires model and policy rollouts, which may introduce significant safety issues in the real world. Although th
AdaWM presents a novel world model approach to handle distribution shifts during finetuning, identifying mismatches in the dynamics model and policy as the main causes of performance drops. With its mismatch identification and alignment-driven finetuning, AdaWM mitigates performance degradation. Extensive experiments in CARLA with challenging tasks demonstrate AdaWM’s clear advantage in both success rate and time-to-collision over baseline models.
The paper could discuss how AdaWM adapts to different task complexities, like high-density urban areas versus highways, as these may impact its adaptability and performance. This leaves some concerns about AdaWM’s robustness across highly divergent scenarios. Mismatch identification relies on total variation distance, which may add computational load in real time. It would be helpful to explore the complexity of this approach in more detail. The LoRA-based low-rank adaptation for dynamics mod
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotic Path Planning Algorithms · Artificial Intelligence in Games
MethodsEntropy Regularization · Proximal Policy Optimization · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · CARLA: An Open Urban Driving Simulator
