SafeAlign-VLA: A Negative-Enhanced Safe Alignment Framework for Risk-Aware Autonomous Driving
Kefei Tian, Yuansheng Lian, Kai Yang, Xiangdong Chen, Shen Li

TL;DR
SafeAlign-VLA introduces a negative data integration framework for risk-aware autonomous driving, enhancing safety and robustness by leveraging counterfactual reasoning and contrastive learning.
Contribution
It proposes a novel negative-enhanced safe alignment framework that incorporates negative samples into training for improved safety boundary understanding in VLA models.
Findings
Achieves 89.1 PDMS on NAVSIM v1, surpassing baseline by 1.3%.
Reduces collision rate to 3.36% on DeepAccident.
Maintains high language and risk prediction accuracy (84.2% and 85.8%).
Abstract
End-to-end autonomous driving systems excel in common scenarios but struggle with safety-critical long-tail cases. Vision-Language-Action (VLA) models are promising due to their strong reasoning capabilities. However, most VLA-based approaches rely on positive expert demonstrations, rarely exploiting negative samples, leading to insufficient understanding of risky behaviors and safety boundaries. To address this limitation, we propose SafeAlign-VLA, a unified negative-enhanced safe alignment framework that incorporates negative data into supervised learning and reinforcement learning. First, we develop a counterfactual safety pairing paradigm to generate structured safety labels and counterfactual positive trajectories from risky scenarios via counterfactual reasoning. Then, a two-stage training strategy is adopted: negative-enhanced supervised fine-tuning for failure feedback and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
