Abductive Ego-View Accident Video Understanding for Safe Driving Perception
Jianwu Fang, Lei-lei Li, Junfei Zhou, Junbin Xiao, Hongkai Yu, Chen, Lv, Jianru Xue, and Tat-Seng Chua

TL;DR
This paper introduces MM-AU, a large multi-modal accident video dataset with annotations, and proposes AdVersa-SD, an abductive framework using object-centric diffusion and CLIP for understanding accident cause-effect chains to enhance safe driving perception.
Contribution
It presents a new dataset MM-AU and a novel abductive video understanding framework AdVersa-SD that models accident causality using object-centric diffusion and contrastive learning.
Findings
AdVersa-SD outperforms state-of-the-art diffusion models in accident understanding.
MM-AU enables comprehensive benchmarking for accident detection and reasoning.
The framework effectively captures cause-effect relationships in accident videos.
Abstract
We present MM-AU, a novel dataset for Multi-Modal Accident video Understanding. MM-AU contains 11,727 in-the-wild ego-view accident videos, each with temporally aligned text descriptions. We annotate over 2.23 million object boxes and 58,650 pairs of video-based accident reasons, covering 58 accident categories. MM-AU supports various accident understanding tasks, particularly multimodal video diffusion to understand accident cause-effect chains for safe driving. With MM-AU, we present an Abductive accident Video understanding framework for Safe Driving perception (AdVersa-SD). AdVersa-SD performs video diffusion via an Object-Centric Video Diffusion (OAVD) method which is driven by an abductive CLIP model. This model involves a contrastive interaction loss to learn the pair co-occurrence of normal, near-accident, accident frames with the corresponding text descriptions, such as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Autonomous Vehicle Technology and Safety · Generative Adversarial Networks and Image Synthesis
MethodsDiffusion · Contrastive Language-Image Pre-training
