Accelerating Structured Chain-of-Thought in Autonomous Vehicles

Yi Gu; Yan Wang; Yuxiao Chen; Yurong You; Wenjie Luo; Yue Wang; Wenhao Ding; Boyi Li; Heng Yang; Boris Ivanovic; Marco Pavone

arXiv:2602.02864·cs.RO·February 4, 2026

Accelerating Structured Chain-of-Thought in Autonomous Vehicles

Yi Gu, Yan Wang, Yuxiao Chen, Yurong You, Wenjie Luo, Yue Wang, Wenhao Ding, Boyi Li, Heng Yang, Boris Ivanovic, Marco Pavone

PDF

Open Access 3 Reviews

TL;DR

This paper introduces FastDriveCoT, a parallel decoding method that accelerates structured Chain-of-Thought reasoning in autonomous vehicles, significantly reducing inference latency while maintaining reasoning quality.

Contribution

We propose a novel parallel decoding approach for structured CoT that decomposes reasoning into sub-tasks, enabling concurrent generation and faster inference in autonomous driving models.

Findings

01

Achieved 3-4× speedup in CoT generation

02

Reduced end-to-end latency significantly

03

Maintained downstream task performance

Abstract

Chain-of-Thought (CoT) reasoning enhances the decision-making capabilities of vision-language-action models in autonomous driving, but its autoregressive nature introduces significant inference latency, making it impractical for real-time applications. To address this, we introduce FastDriveCoT, a novel parallel decoding method that accelerates template-structured CoT. Our approach decomposes the reasoning process into a dependency graph of distinct sub-tasks, such as identifying critical objects and summarizing traffic rules, some of which can be generated in parallel. By generating multiple independent reasoning steps concurrently within a single forward pass, we significantly reduce the number of sequential computations. Experiments demonstrate a 3-4 $\times$ speedup in CoT generation and a substantial reduction in end-to-end latency across various model architectures, all while…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 2Confidence 4

Strengths

- Good overall novelty for AV: The presented approach shows good novelty in the field of AV reasoning. - The authors present good technical innovation, by combining the structure CoT tmeplate, the dynamic programming algorithm and by maintining the zero extra FLOPs - Good emperical results by showing the speed up of 3-4 times CoT reasonsing and therfore 2x faster E2E inference - Good emperical results with different VLMs like Qwen2, Qwen2.5 and Qwen3 - Good comprehensive albation studies for ef

Weaknesses

Currently, the paper shows a very good technical approach, but there are strong weaknesses that questions the papers overall impact: - Currently, the method relies on highly structured reasoning templates that are selected by the authors for the driving task. It is unclear to the reader, if this enalbes generalization at all. My concern here is that this approach will not generalize very well to open-ended reasoning tasks. This is unfortunately something we see in AV every day. - This dependenc

Reviewer 02Rating 6Confidence 2

Strengths

- The reviewer found the proposed idea to predefine a CoT template and decode independent fields in parallel using a dependency graph to be interesting. - Again, interestingly parallel decoding even slightly improves template adherence so trajectory ADE at 3 seconds for Qwen2.5 VL 3B improves to 0.482 from 0.511 showing structure can help quality. - Experiments in Table 1, the ablation style analysis in Figure 4, show consistent 3$\times$ to 4$\times$ CoT speedup with only small drops in some l

Weaknesses

- Table 1 only compares no CoT and standard autoregressive CoT but it should also compare against shorter skeleton of thought decoding or speculative decoding baselines which are natural for speed claims. - Some typos the reviewer could see: Line 115: dependecies -> dependencies; Figure 3 caption independency -> independence; Line 352 diving -> driving - See questions below.

Reviewer 03Rating 2Confidence 3

Strengths

- This paper addresses a very practical and significant problem. The robustness and interpretability of LLMs are crucial for AV systems, but their latency is a major obstacle to deployment. - The method of decomposing CoT into a dependency graph and achieving parallel decoding in a single forward pass via a custom attention mask appears sensible. This approach can effectively utilize the parallel computing capabilities of modern GPUs and fully reuse the KV cache. - The paper demonstrates signi

Weaknesses

- The core contribution heavily relies on a manually designed, highly fixed CoT template. Although the authors mention this template is an "example" (line 185), the entire methodology (including the dependency graph construction) is based on this fixed structure. Furthermore, when handling "multi-instance" fields (like lanes and critical objects), the method depends on a fixed number of slots (e.g., 3 time ranges for lanes, 4 critical objects). This might be fragile in complex, dynamic real-worl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Autonomous Vehicle Technology and Safety · Reinforcement Learning in Robotics