Collision-Aware Vision-Language Learning for End-to-End Driving with Multimodal Infraction Datasets

Alex Koran; Dimitrios Sinodinos; Hadi Hojjati; Takuya Nanri; Fangge Chen; Narges Armanfard

arXiv:2603.25946·cs.CV·March 30, 2026

Collision-Aware Vision-Language Learning for End-to-End Driving with Multimodal Infraction Datasets

Alex Koran, Dimitrios Sinodinos, Hadi Hojjati, Takuya Nanri, Fangge Chen, Narges Armanfard

PDF

TL;DR

This paper introduces VLAAD, a collision-aware model for autonomous driving, leveraging a new multimodal dataset and demonstrating significant improvements in driving scores and collision detection both in simulation and real-world data.

Contribution

The paper presents VLAAD, a novel collision-aware learning module trained on CARLA-Collide and Real-Collide datasets, enhancing end-to-end driving models with better collision prediction capabilities.

Findings

01

VLAAD improves driving scores by 14.12% when integrated into TransFuser++.

02

On Real-Collide, VLAAD outperforms larger models with a 23.3% AUC increase.

03

CARLA-Collide provides diverse, realistic collision data for training.

Abstract

High infraction rates remain the primary bottleneck for end-to-end (E2E) autonomous driving, as evidenced by the low driving scores on the CARLA Leaderboard. Despite collision-related infractions being the dominant failure mode in closed-loop evaluations, collision-aware representation learning has received limited attention. To address this gap, we first develop a Video-Language-Augmented Anomaly Detector (VLAAD), leveraging a Multiple Instance Learning (MIL) formulation to obtain stable, temporally localized collision signals for proactive prediction. To transition these capabilities into closed-loop simulations, we must overcome the limitations of existing simulator datasets, which lack multimodality and are frequently restricted to simple intersection scenarios. Therefore, we introduce CARLA-Collide, a large-scale multimodal dataset capturing realistic collision events across highly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.