Abductive Ego-View Accident Video Understanding for Safe Driving   Perception

Jianwu Fang; Lei-lei Li; Junfei Zhou; Junbin Xiao; Hongkai Yu; Chen; Lv; Jianru Xue; and Tat-Seng Chua

arXiv:2403.00436·cs.CV·March 4, 2024·1 cites

Abductive Ego-View Accident Video Understanding for Safe Driving Perception

Jianwu Fang, Lei-lei Li, Junfei Zhou, Junbin Xiao, Hongkai Yu, Chen, Lv, Jianru Xue, and Tat-Seng Chua

PDF

Open Access

TL;DR

This paper introduces MM-AU, a large multi-modal accident video dataset with annotations, and proposes AdVersa-SD, an abductive framework using object-centric diffusion and CLIP for understanding accident cause-effect chains to enhance safe driving perception.

Contribution

It presents a new dataset MM-AU and a novel abductive video understanding framework AdVersa-SD that models accident causality using object-centric diffusion and contrastive learning.

Findings

01

AdVersa-SD outperforms state-of-the-art diffusion models in accident understanding.

02

MM-AU enables comprehensive benchmarking for accident detection and reasoning.

03

The framework effectively captures cause-effect relationships in accident videos.

Abstract

We present MM-AU, a novel dataset for Multi-Modal Accident video Understanding. MM-AU contains 11,727 in-the-wild ego-view accident videos, each with temporally aligned text descriptions. We annotate over 2.23 million object boxes and 58,650 pairs of video-based accident reasons, covering 58 accident categories. MM-AU supports various accident understanding tasks, particularly multimodal video diffusion to understand accident cause-effect chains for safe driving. With MM-AU, we present an Abductive accident Video understanding framework for Safe Driving perception (AdVersa-SD). AdVersa-SD performs video diffusion via an Object-Centric Video Diffusion (OAVD) method which is driven by an abductive CLIP model. This model involves a contrastive interaction loss to learn the pair co-occurrence of normal, near-accident, accident frames with the corresponding text descriptions, such as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Autonomous Vehicle Technology and Safety · Generative Adversarial Networks and Image Synthesis

MethodsDiffusion · Contrastive Language-Image Pre-training