StyleVLA: Driving Style-Aware Vision Language Action Model for Autonomous Driving
Yuan Gao, Dengyuan Hua, Mattia Piccinini, Finn Rasmus Sch\"afer, Korbinian Moller, Lin Li, Johannes Betz

TL;DR
StyleVLA is a physics-informed vision-language model for autonomous driving that generates diverse, feasible, and style-adherent driving behaviors, outperforming existing proprietary models on key metrics.
Contribution
We introduce StyleVLA, a novel physics-informed VLA framework with a hybrid loss and large-scale dataset, enabling personalized and physically plausible driving behavior generation.
Findings
StyleVLA outperforms proprietary models like Gemini-3-Pro in driving scores.
The hybrid loss improves trajectory feasibility and style adherence.
Large-scale dataset supports diverse style-conditioned trajectory learning.
Abstract
Vision Language Models (VLMs) bridge visual perception and linguistic reasoning. In Autonomous Driving (AD), this synergy has enabled Vision Language Action (VLA) models, which translate high-level multimodal understanding into driving behaviors, typically represented as future trajectories. However, existing VLA models mainly generate generic collision-free trajectories. Beyond collision avoidance, adapting to diverse driving styles (e.g., sporty, comfortable) is essential for personalized driving. Moreover, many methods treat trajectory generation as naive token prediction, which can produce kinematically infeasible actions. To address these limitations, we present StyleVLA, a physics-informed VLA framework for generating diverse and physically plausible driving behaviors. We introduce a hybrid loss that combines a kinematic consistency constraint with a continuous regression head to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Autonomous Vehicle Technology and Safety · Advanced Neural Network Applications
