Eyes on the Road, Mind Beyond Vision: Context-Aware Multi-modal Enhanced Risk Anticipation
Jiaxun Zhang, Haicheng Liao, Yumu Xie, Chengyue Wang, Yanchen Guan, Bin Rao, Zhenning Li

TL;DR
This paper introduces CAMERA, a multi-modal framework that combines visual, textual, and attention data to improve accident anticipation in dynamic traffic scenarios, achieving state-of-the-art results.
Contribution
The paper presents a novel adaptive, multi-modal approach with hierarchical fusion and a Geo-Context module for more accurate and interpretable accident prediction.
Findings
Achieves state-of-the-art accuracy on DADA-2000 benchmark.
Reduces false alarms while maintaining high recall.
Improves lead time for accident anticipation.
Abstract
Accurate accident anticipation remains challenging when driver cognition and dynamic road conditions are underrepresented in predictive models. In this paper, we propose CAMERA (Context-Aware Multi-modal Enhanced Risk Anticipation), a multi-modal framework integrating dashcam video, textual annotations, and driver attention maps for robust accident anticipation. Unlike existing methods that rely on static or environment-centric thresholds, CAMERA employs an adaptive mechanism guided by scene complexity and gaze entropy, reducing false alarms while maintaining high recall in dynamic, multi-agent traffic scenarios. A hierarchical fusion pipeline with Bi-GRU (Bidirectional GRU) captures spatio-temporal dependencies, while a Geo-Context Vision-Language module translates 3D spatial relationships into interpretable, human-centric alerts. Evaluations on the DADA-2000 and benchmarks show that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Visualization and Analytics · Multimodal Machine Learning Applications · Mobile Crowdsensing and Crowdsourcing
