Collision-Aware Vision-Language Learning for End-to-End Driving with Multimodal Infraction Datasets
Alex Koran, Dimitrios Sinodinos, Hadi Hojjati, Takuya Nanri, Fangge Chen, Narges Armanfard

TL;DR
This paper introduces VLAAD, a collision-aware model for autonomous driving, leveraging a new multimodal dataset and demonstrating significant improvements in driving scores and collision detection both in simulation and real-world data.
Contribution
The paper presents VLAAD, a novel collision-aware learning module trained on CARLA-Collide and Real-Collide datasets, enhancing end-to-end driving models with better collision prediction capabilities.
Findings
VLAAD improves driving scores by 14.12% when integrated into TransFuser++.
On Real-Collide, VLAAD outperforms larger models with a 23.3% AUC increase.
CARLA-Collide provides diverse, realistic collision data for training.
Abstract
High infraction rates remain the primary bottleneck for end-to-end (E2E) autonomous driving, as evidenced by the low driving scores on the CARLA Leaderboard. Despite collision-related infractions being the dominant failure mode in closed-loop evaluations, collision-aware representation learning has received limited attention. To address this gap, we first develop a Video-Language-Augmented Anomaly Detector (VLAAD), leveraging a Multiple Instance Learning (MIL) formulation to obtain stable, temporally localized collision signals for proactive prediction. To transition these capabilities into closed-loop simulations, we must overcome the limitations of existing simulator datasets, which lack multimodality and are frequently restricted to simple intersection scenarios. Therefore, we introduce CARLA-Collide, a large-scale multimodal dataset capturing realistic collision events across highly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
