ReCogDrive: A Reinforced Cognitive Framework for End-to-End Autonomous Driving
Yongkang Li, Kaixin Xiong, Xiangyu Guo, Fang Li, Sixu Yan, Gangwei Xu, Lijun Zhou, Long Chen, Haiyang Sun, Bing Wang, Kun Ma, Guang Chen, Hangjun Ye, Wenyu Liu, Xinggang Wang

TL;DR
ReCogDrive introduces a reinforced cognitive framework that unifies understanding and planning in autonomous driving, leveraging hierarchical data pipelines and diffusion models to improve safety, efficiency, and scene comprehension.
Contribution
The paper presents a novel integrated framework combining hierarchical data processing and diffusion planning to enhance end-to-end autonomous driving performance.
Findings
Achieves state-of-the-art results on NAVSIM and Bench2Drive benchmarks.
Demonstrates improved safety and reduced collisions in diverse scenarios.
Shows enhanced scene understanding and driving stability.
Abstract
Recent studies have explored leveraging the world knowledge and cognitive capabilities of Vision-Language Models (VLMs) to address the long-tail problem in end-to-end autonomous driving. However, existing methods typically formulate trajectory planning as a language modeling task, where physical actions are output in the language space, potentially leading to issues such as format-violating outputs, infeasible actions, and slow inference speeds. In this paper, we propose ReCogDrive, a novel Reinforced Cognitive framework for end-to-end autonomous Driving, unifying driving understanding and planning by integrating an autoregressive model with a diffusion planner. First, to instill human driving cognition into the VLM, we introduce a hierarchical data pipeline that mimics the sequential cognitive process of human drivers through three stages: generation, refinement, and quality control.…
Peer Reviews
Decision·ICLR 2026 Poster
The paper is well-written and easy to follow, with clear figures and good motivation. I like the overall integration of cognitive reasoning (via VLM) and low-level control (via diffusion + RL). Results are strong and consistent across benchmarks, and the ablation studies help justify each component. Technically, the approach is sound.
**(a) Novelty Overlap / Incremental Concerns** Several very recent works (e.g., Drive-R1 Li et al., 2025 and AlphaDrive Jiang et al., 2025) already explore reinforcement learning and reasoning within VLM-based driving. Likewise, Gen-Drive (Huang et al., 2025) combines diffusion with RL for driving policy optimization. Given these, the claim of “first to apply reinforcement learning to VLA models” may be over-stated, and the conceptual contribution, though strong in integration, is not entirel
- The integration of VLMs with diffusion models and reinforcement learning addresses key limitations in current end-to-end driving systems. - The structured approach to data generation and refinement is scalable, and enables the creation of high-quality VQA datasets for autonomous driving. - The use of a diffusion planner to bridge the gap between discrete language outputs and continuous control actions. - The introduction of DiffGRPO is a thoughtful addition that enhances the planner’s ability
- Diffusion Planner: integration of diffusion planner with VLM is explored in ORION. Despite improvements using the diffusion-based approach still incurs higher inference latency compared to VLP, DiMA which distills VLM knowledge to simpler planners. Comparisons with these relevant methods like ORION, VLP and DiMA are missing. - Zero shot testing: The model is evaluated in simulation environments (NAVSIM, CARLA), but lacks real-world deployment or testing, which is crucial for autonomous drivin
1. Well-designed Framework: The core architecture that integrates a cognitive VLM, a diffusion planner, and an RL refinement stage is well designed. It provides an effective solution to the modality mismatch problem for VLM-based agents. 2. Strong Empirical Validation: The paper's claims are well-supported by strong empirical performance on two challenging benchmarks (NAVSIM and Bench2Drive), comprehensive ablation studies that isolate the gains from each component, and a methodical data curatio
**Major Weaknesses:** 1. Lack of Motivation for GRPO: The paper provides insufficient insight into the choice of Group Relative Policy Optimization (GRPO). It is not clear why GRPO is better suited for optimizing a diffusion policy in this context compared to other well-established RL algorithms (e.g., PPO, or a simpler policy gradient method like REINFORCE). 2. Unclear DiffGRPO Algorithm Design: The description of the DiffGRPO algorithm seems to imply a fundamental difference from typical GRPO
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAutonomous Vehicle Technology and Safety · Reinforcement Learning in Robotics · Transportation and Mobility Innovations
MethodsDiffusion
