When, Where, and What? A Novel Benchmark for Accident Anticipation and Localization with Large Language Models
Haicheng Liao, Yongkang Li, Chengyue Wang, Yanchen Guan, KaHou Tam,, Chunlin Tian, Li Li, Chengzhong Xu, Zhenning Li

TL;DR
This paper presents a novel framework integrating Large Language Models with a chain-based attention mechanism to improve accident anticipation and localization in autonomous driving, achieving new performance benchmarks.
Contribution
Introduces a new multimodal, LLM-based framework with a chain-based attention mechanism for accident prediction and localization in autonomous driving.
Findings
Superior AP and mTTA on DAD, CCD, and A3D datasets
Establishes new benchmarks for accident anticipation
Enhances human-AI interaction in traffic safety
Abstract
As autonomous driving systems increasingly become part of daily transportation, the ability to accurately anticipate and mitigate potential traffic accidents is paramount. Traditional accident anticipation models primarily utilizing dashcam videos are adept at predicting when an accident may occur but fall short in localizing the incident and identifying involved entities. Addressing this gap, this study introduces a novel framework that integrates Large Language Models (LLMs) to enhance predictive capabilities across multiple dimensions--what, when, and where accidents might occur. We develop an innovative chain-based attention mechanism that dynamically adjusts to prioritize high-risk elements within complex driving scenes. This mechanism is complemented by a three-stage model that processes outputs from smaller models into detailed multimodal inputs for LLMs, thus enabling a more…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSoftmax · Attention Is All You Need
