Using Multimodal Large Language Models for Automated Detection of Traffic Safety Critical Events
Mohammad Abu Tami, Huthaifa I. Ashqar, and Mohammed Elhenawy

TL;DR
This paper explores the use of Multimodal Large Language Models to automate the detection of safety-critical events in driving videos, aiming to improve accuracy and reliability in autonomous vehicle safety analysis.
Contribution
It introduces a novel framework leveraging MLLMs with context-specific prompts for hazard detection in driving videos, addressing hallucination issues and demonstrating potential in zero-shot learning.
Findings
Preliminary results show promise in zero-shot scenario analysis.
Framework demonstrates potential for accurate safety-critical event detection.
Further validation on larger datasets is needed.
Abstract
Traditional approaches to safety event analysis in autonomous systems have relied on complex machine learning models and extensive datasets for high accuracy and reliability. However, the advent of Multimodal Large Language Models (MLLMs) offers a novel approach by integrating textual, visual, and audio modalities, thereby providing automated analyses of driving videos. Our framework leverages the reasoning power of MLLMs, directing their output through context-specific prompts to ensure accurate, reliable, and actionable insights for hazard detection. By incorporating models like Gemini-Pro-Vision 1.5 and Llava, our methodology aims to automate the safety critical events and mitigate common issues such as hallucinations in MLLM outputs. Preliminary results demonstrate the framework's potential in zero-shot learning and accurate scenario analysis, though further validation on larger…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTraffic Prediction and Management Techniques · Sentiment Analysis and Opinion Mining
