Judge, Then Drive: A Critic-Centric Vision Language Action Framework for Autonomous Driving
Lijin Yang, Jianing Huang, Zhongzhan Huang, Shu Liu, Hao Yang

TL;DR
This paper introduces CriticVLA, a two-stage vision language action framework for autonomous driving that uses a critic to evaluate and refine driving trajectories, significantly improving performance.
Contribution
The paper proposes a novel critic-centric VLA framework with a large synthetic dataset, enhancing decision refinement in autonomous driving tasks.
Findings
CriticVLA achieves 73.33% success rate on Bench2Drive.
It delivers about 30% improvement in challenging scenarios.
The framework outperforms state-of-the-art baselines.
Abstract
Recent advances in vision language action (VLA) models have shown remarkable potential for autonomous driving by directly mapping multimodal inputs to control signals. However, previous VLA-based methods have not explicitly exploited the critic capability of VLAs to refine driving decisions, even though such capability has been well demonstrated in other LLM-based domains, thereby limiting their performance in complex closed-loop scenarios. In this work, we present a theoretically inspired two-stage framework, CriticVLA, which extends the role of VLAs from acting to judging. CriticVLA first generates a rough trajectory and then refines it through multimodal evaluation and single-step optimization guided by a VLA-based critic, yielding higher-quality driving behaviors. To support this process, we construct a large-scale synthetic dataset of 12.9 million annotated trajectories covering…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
