Large Multimodal Models for Embodied Intelligent Driving: The Next Frontier in Self-Driving?

Long Zhang; Yuchen Xia; Bingqing Wei; Zhen Liu; Shiwen Mao; Zhu Han; Mohsen Guizani

arXiv:2601.08434·cs.RO·January 21, 2026

Large Multimodal Models for Embodied Intelligent Driving: The Next Frontier in Self-Driving?

Long Zhang, Yuchen Xia, Bingqing Wei, Zhen Liu, Shiwen Mao, Zhu Han, Mohsen Guizani

PDF

Open Access

TL;DR

This paper proposes a hybrid decision framework combining large multimodal models and deep reinforcement learning to enhance embodied intelligent driving, enabling continuous learning and joint decision-making for autonomous vehicles.

Contribution

It introduces a novel semantics and policy dual-driven hybrid framework integrating LMMs and DRL for improved autonomous driving capabilities.

Findings

01

Framework outperforms existing methods in lane-change planning tasks

02

Enables continuous learning through embodied AI interactions

03

Facilitates joint decision-making for autonomous driving systems

Abstract

The advent of Large Multimodal Models (LMMs) offers a promising technology to tackle the limitations of modular design in autonomous driving, which often falters in open-world scenarios requiring sustained environmental understanding and logical reasoning. Besides, embodied artificial intelligence facilitates policy optimization through closed-loop interactions to achieve the continuous learning capability, thereby advancing autonomous driving toward embodied intelligent (El) driving. However, such capability will be constrained by relying solely on LMMs to enhance EI driving without joint decision-making. This article introduces a novel semantics and policy dual-driven hybrid decision framework to tackle this challenge, ensuring continuous learning and joint decision. The framework merges LMMs for semantic understanding and cognitive representation, and deep reinforcement learning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAutonomous Vehicle Technology and Safety · Reinforcement Learning in Robotics · Human-Automation Interaction and Safety