AutoDriDM: An Explainable Benchmark for Decision-Making of Vision-Language Models in Autonomous Driving

Zecong Tang; Zixu Wang; Yifei Wang; Weitong Lian; Tianjian Gao; Haoran Li; Tengju Ru; Lingyi Meng; Zhejun Cui; Yichen Zhu; Qi Kang; Kaixuan Wang; Yu Zhang

arXiv:2601.14702·cs.AI·January 22, 2026

AutoDriDM: An Explainable Benchmark for Decision-Making of Vision-Language Models in Autonomous Driving

Zecong Tang, Zixu Wang, Yifei Wang, Weitong Lian, Tianjian Gao, Haoran Li, Tengju Ru, Lingyi Meng, Zhejun Cui, Yichen Zhu, Qi Kang, Kaixuan Wang, Yu Zhang

PDF

Open Access 1 Datasets

TL;DR

AutoDriDM introduces a comprehensive, decision-focused benchmark for evaluating vision-language models in autonomous driving, emphasizing reasoning and decision-making over perception alone.

Contribution

The paper presents AutoDriDM, a novel benchmark with 6,650 questions across Object, Scene, and Decision dimensions, and analyzes perception-decision boundaries and reasoning failures in VLMs.

Findings

01

Weak correlation between perception and decision performance.

02

Identification of key failure modes like logical reasoning errors.

03

Introduction of an automated annotation analyzer model.

Abstract

Autonomous driving is a highly challenging domain that requires reliable perception and safe decision-making in complex scenarios. Recent vision-language models (VLMs) demonstrate reasoning and generalization abilities, opening new possibilities for autonomous driving; however, existing benchmarks and metrics overemphasize perceptual competence and fail to adequately assess decision-making processes. In this work, we present AutoDriDM, a decision-centric, progressive benchmark with 6,650 questions across three dimensions - Object, Scene, and Decision. We evaluate mainstream VLMs to delineate the perception-to-decision capability boundary in autonomous driving, and our correlation analysis reveals weak alignment between perception and decision-making performance. We further conduct explainability analyses of models' reasoning processes, identifying key failure modes such as logical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

ColamentosZJU/AutoDriDM
dataset· 18 dl
18 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning