CoC-VLA: Delving into Adversarial Domain Transfer for Explainable Autonomous Driving via Chain-of-Causality Visual-Language-Action Model

Dapeng Zhang; Fei Shen; Rui Zhao; Yinda Chen; Peng Zhi; Chenyang Li; Rui Zhou; Qingguo Zhou

arXiv:2511.19914·cs.RO·November 26, 2025

CoC-VLA: Delving into Adversarial Domain Transfer for Explainable Autonomous Driving via Chain-of-Causality Visual-Language-Action Model

Dapeng Zhang, Fei Shen, Rui Zhao, Yinda Chen, Peng Zhi, Chenyang Li, Rui Zhou, Qingguo Zhou

PDF

Open Access

TL;DR

This paper introduces CoC-VLA, an adversarial transfer framework that leverages visual-language models to transfer complex, long-tail driving capabilities from simulation to real-world autonomous driving, enhancing reasoning and interpretability.

Contribution

It proposes a novel Chain-of-Causality Visual-Language Model architecture and an adversarial transfer method to improve real-world autonomous driving by utilizing simulation data effectively.

Findings

01

Effective transfer of long-tail driving capabilities from simulation to real-world.

02

Enhanced reasoning and interpretability in autonomous driving systems.

03

Successful integration of visual-language models with adversarial training.

Abstract

Autonomous driving represents a prominent application of artificial intelligence. Recent approaches have shifted from focusing solely on common scenarios to addressing complex, long-tail situations such as subtle human behaviors, traffic accidents, and non-compliant driving patterns. Given the demonstrated capabilities of large language models (LLMs) in understanding visual and natural language inputs and following instructions, recent methods have integrated LLMs into autonomous driving systems to enhance reasoning, interpretability, and performance across diverse scenarios. However, existing methods typically rely either on real-world data, which is suitable for industrial deployment, or on simulation data tailored to rare or hard case scenarios. Few approaches effectively integrate the complementary advantages of both data sources. To address this limitation, we propose a novel…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Generative Adversarial Networks and Image Synthesis