ThinkDrive: Chain-of-Thought Guided Progressive Reinforcement Learning Fine-Tuning for Autonomous Driving

Chang Zhao; Zheming Yang; Yunqing Hu; Qi Guo; Zijian Wang; Pengcheng Li; Wen Ji

arXiv:2601.04714·cs.AI·January 9, 2026

ThinkDrive: Chain-of-Thought Guided Progressive Reinforcement Learning Fine-Tuning for Autonomous Driving

Chang Zhao, Zheming Yang, Yunqing Hu, Qi Guo, Zijian Wang, Pengcheng Li, Wen Ji

PDF

Open Access

TL;DR

ThinkDrive introduces a novel two-stage training framework combining chain-of-thought reasoning with progressive reinforcement learning, significantly improving autonomous driving decision-making and generalization over existing methods.

Contribution

It proposes a CoT-guided progressive RL fine-tuning approach with difficulty-aware adaptive policy optimization for autonomous driving.

Findings

01

Outperforms strong RL baselines by 1.45%, 1.95%, and 1.01% on different metrics.

02

A 2B-parameter model trained with ThinkDrive surpasses GPT-4o by 3.28%.

03

Demonstrates improved decision transparency and generalization in autonomous driving.

Abstract

With the rapid advancement of large language models (LLMs) technologies, their application in the domain of autonomous driving has become increasingly widespread. However, existing methods suffer from unstructured reasoning, poor generalization, and misalignment with human driving intent. While Chain-of-Thought (CoT) reasoning enhances decision transparency, conventional supervised fine-tuning (SFT) fails to fully exploit its potential, and reinforcement learning (RL) approaches face instability and suboptimal reasoning depth. We propose ThinkDrive, a CoT guided progressive RL fine-tuning framework for autonomous driving that synergizes explicit reasoning with difficulty-aware adaptive policy optimization. Our method employs a two-stage training strategy. First, we perform SFT using CoT explanations. Then, we apply progressive RL with a difficulty-aware adaptive policy optimizer that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Multimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI)