LMAD: Integrated End-to-End Vision-Language Model for Explainable Autonomous Driving

Nan Song; Bozhou Zhang; Xiatian Zhu; Jiankang Deng; Li Zhang

arXiv:2508.12404·cs.CV·August 19, 2025

LMAD: Integrated End-to-End Vision-Language Model for Explainable Autonomous Driving

Nan Song, Bozhou Zhang, Xiatian Zhu, Jiankang Deng, Li Zhang

PDF

Open Access

TL;DR

This paper introduces LMAD, a novel vision-language framework for autonomous driving that enhances scene understanding and explainability by integrating specialized adapters and comprehensive scene reasoning, improving driving reasoning performance.

Contribution

The paper presents a new end-to-end vision-language model tailored for autonomous driving, incorporating scene interaction and expert adapters for better spatial awareness and explainability.

Findings

01

Significantly improves performance on driving reasoning tasks

02

Sets new standards in explainable autonomous driving

03

Compatible with existing vision-language models

Abstract

Large vision-language models (VLMs) have shown promising capabilities in scene understanding, enhancing the explainability of driving behaviors and interactivity with users. Existing methods primarily fine-tune VLMs on on-board multi-view images and scene reasoning text, but this approach often lacks the holistic and nuanced scene recognition and powerful spatial awareness required for autonomous driving, especially in complex situations. To address this gap, we propose a novel vision-language framework tailored for autonomous driving, called LMAD. Our framework emulates modern end-to-end driving paradigms by incorporating comprehensive scene understanding and a task-specialized structure with VLMs. In particular, we introduce preliminary scene interaction and specialized expert adapters within the same driving task structure, which better align VLMs with autonomous driving scenarios.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Machine Learning in Healthcare