TL;DR
DriveAgent-R1 introduces an active perception and hybrid reasoning framework for autonomous driving, enabling visual evidence seeking and adaptive thinking to improve interpretability and performance.
Contribution
It is the first autonomous driving agent to incorporate active perception and hybrid thinking, enhancing decision-making with visual reasoning and adaptive reasoning strategies.
Findings
Achieves performance comparable to top models and humans on driving benchmarks.
Utilizes a three-stage training strategy with Cascaded Reinforcement Learning.
Operates effectively with only 3 billion parameters.
Abstract
The advent of Vision-Language Models (VLMs) has significantly advanced end-to-end autonomous driving, demonstrating powerful reasoning abilities for high-level behavior planning tasks. However, existing methods are often constrained by a passive perception paradigm, relying solely on text-based reasoning. This passivity restricts the model's capacity to actively seek crucial visual evidence when faced with uncertainty. To address this, we introduce DriveAgent-R1, the first autonomous driving agent capable of active perception for planning. In complex scenarios, DriveAgent-R1 proactively invokes tools to perform visual reasoning, firmly grounding its decisions in visual evidence, thereby enhancing both interpretability and reliability. Furthermore, we propose a hybrid thinking framework, inspired by human driver cognitive patterns, allowing the agent to adaptively switch between…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
