DriveAgent-R1: Advancing VLM-based Autonomous Driving with Active Perception and Hybrid Thinking

Weicheng Zheng; Xiaofei Mao; Nanfei Ye; Pengxiang Li; Kun Zhan; Xianpeng Lang; Hang Zhao

arXiv:2507.20879·cs.CV·April 21, 2026

DriveAgent-R1: Advancing VLM-based Autonomous Driving with Active Perception and Hybrid Thinking

Weicheng Zheng, Xiaofei Mao, Nanfei Ye, Pengxiang Li, Kun Zhan, Xianpeng Lang, Hang Zhao

PDF

1 Video

TL;DR

DriveAgent-R1 introduces an active perception and hybrid reasoning framework for autonomous driving, enabling visual evidence seeking and adaptive thinking to improve interpretability and performance.

Contribution

It is the first autonomous driving agent to incorporate active perception and hybrid thinking, enhancing decision-making with visual reasoning and adaptive reasoning strategies.

Findings

01

Achieves performance comparable to top models and humans on driving benchmarks.

02

Utilizes a three-stage training strategy with Cascaded Reinforcement Learning.

03

Operates effectively with only 3 billion parameters.

Abstract

The advent of Vision-Language Models (VLMs) has significantly advanced end-to-end autonomous driving, demonstrating powerful reasoning abilities for high-level behavior planning tasks. However, existing methods are often constrained by a passive perception paradigm, relying solely on text-based reasoning. This passivity restricts the model's capacity to actively seek crucial visual evidence when faced with uncertainty. To address this, we introduce DriveAgent-R1, the first autonomous driving agent capable of active perception for planning. In complex scenarios, DriveAgent-R1 proactively invokes tools to perform visual reasoning, firmly grounding its decisions in visual evidence, thereby enhancing both interpretability and reliability. Furthermore, we propose a hybrid thinking framework, inspired by human driver cognitive patterns, allowing the agent to adaptively switch between…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

DriveAgent-R1: Advancing VLM-based Autonomous Driving with Active Perception and Hybrid Thinking· slideslive