C-CoT: Counterfactual Chain-of-Thought with Vision-Language Models for Safe Autonomous Driving
Kefei Tian, Yuansheng Lian, Kai Yang, Xiangdong Chen, Shen Li

TL;DR
This paper introduces C-CoT, a vision-language model framework for safe autonomous driving that employs counterfactual reasoning to improve decision robustness in complex urban scenarios.
Contribution
It proposes a novel counterfactual chain-of-thought framework with a structured meta-action evaluation tree for causal safety reasoning in autonomous driving.
Findings
Achieved 81.9% risk prediction recall.
Reduced collision rate to 3.52%.
Lowered L2 error to 1.98 meters.
Abstract
Safety-critical planning in complex environments, particularly at urban intersections, remains a fundamental challenge for autonomous driving. Existing methods, whether rule-based or data-driven, frequently struggle to capture complex scene semantics, infer potential risks, and make reliable decisions in rare, high-risk situations. While vision-language models (VLMs) offer promising approaches for safe decision-making in these environments, most current approaches lack reflective and causal reasoning, thereby limiting their overall robustness. To address this, we propose a counterfactual chain-of-thought (C-CoT) framework that leverages VLMs to decompose driving decisions into five sequential stages: scene description, critical object identification, risk prediction, counterfactual risk reasoning, and final action planning. Within the counterfactual reasoning stage, we introduce a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
