C-CoT: Counterfactual Chain-of-Thought with Vision-Language Models for Safe Autonomous Driving

Kefei Tian; Yuansheng Lian; Kai Yang; Xiangdong Chen; Shen Li

arXiv:2605.10744·cs.CV·May 12, 2026

C-CoT: Counterfactual Chain-of-Thought with Vision-Language Models for Safe Autonomous Driving

Kefei Tian, Yuansheng Lian, Kai Yang, Xiangdong Chen, Shen Li

PDF

TL;DR

This paper introduces C-CoT, a vision-language model framework for safe autonomous driving that employs counterfactual reasoning to improve decision robustness in complex urban scenarios.

Contribution

It proposes a novel counterfactual chain-of-thought framework with a structured meta-action evaluation tree for causal safety reasoning in autonomous driving.

Findings

01

Achieved 81.9% risk prediction recall.

02

Reduced collision rate to 3.52%.

03

Lowered L2 error to 1.98 meters.

Abstract

Safety-critical planning in complex environments, particularly at urban intersections, remains a fundamental challenge for autonomous driving. Existing methods, whether rule-based or data-driven, frequently struggle to capture complex scene semantics, infer potential risks, and make reliable decisions in rare, high-risk situations. While vision-language models (VLMs) offer promising approaches for safe decision-making in these environments, most current approaches lack reflective and causal reasoning, thereby limiting their overall robustness. To address this, we propose a counterfactual chain-of-thought (C-CoT) framework that leverages VLMs to decompose driving decisions into five sequential stages: scene description, critical object identification, risk prediction, counterfactual risk reasoning, and final action planning. Within the counterfactual reasoning stage, we introduce a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.