CoT4AD: A Vision-Language-Action Model with Explicit Chain-of-Thought Reasoning for Autonomous Driving

Zhaohui Wang; Tengbo Yu; Hao Tang

arXiv:2511.22532·cs.CV·December 1, 2025

CoT4AD: A Vision-Language-Action Model with Explicit Chain-of-Thought Reasoning for Autonomous Driving

Zhaohui Wang, Tengbo Yu, Hao Tang

PDF

Open Access

TL;DR

CoT4AD introduces a Chain-of-Thought reasoning framework into vision-language-action models for autonomous driving, significantly improving reasoning and decision-making in complex scenarios.

Contribution

It presents a novel CoT-based VLA framework that explicitly models reasoning processes, enhancing numerical and causal reasoning in autonomous driving tasks.

Findings

01

Achieves state-of-the-art results on nuScenes and Bench2Drive benchmarks.

02

Improves reasoning capabilities in complex driving scenarios.

03

Demonstrates robustness in dynamic environments.

Abstract

Vision-Language-Action (VLA) models have recently attracted growing attention in end-to-end autonomous driving for their strong reasoning capabilities and rich world knowledge. However, existing VLAs often suffer from limited numerical reasoning ability and overly simplified input-output mappings, which hinder their performance in complex driving scenarios requiring step-by-step causal reasoning. To address these challenges, we propose CoT4AD, a novel VLA framework that introduces Chain-of-Thought (CoT) reasoning for autonomous driving to enhance both numerical and causal reasoning in Vision-Language Models (VLMs). CoT4AD integrates visual observations and language instructions to perform semantic reasoning, scene understanding, and trajectory planning. During training, it explicitly models a perception-question-prediction-action CoT to align the reasoning space with the action space…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Autonomous Vehicle Technology and Safety · Advanced Neural Network Applications