When language and vision meet road safety: leveraging multimodal large language models for video-based traffic accident analysis

Ruixuan Zhang; Beichen Wang; Juexiao Zhang; Zilin Bian; Chen Feng; Kaan Ozbay

arXiv:2501.10604·cs.CV·June 18, 2025

When language and vision meet road safety: leveraging multimodal large language models for video-based traffic accident analysis

Ruixuan Zhang, Beichen Wang, Juexiao Zhang, Zilin Bian, Chen Feng, Kaan Ozbay

PDF

Open Access 1 Repo

TL;DR

This paper introduces SeeUnsafe, a multimodal large language model framework that transforms traffic accident video analysis into an interactive, conversational process, improving efficiency and adaptability over traditional methods.

Contribution

The paper presents a novel MLLM-based framework for traffic accident analysis that automates complex tasks and enables interactive, fine-grained insights from traffic videos.

Findings

01

Effective accident-aware video classification demonstrated

02

Enhanced visual grounding accuracy achieved

03

Framework outperforms traditional methods on traffic safety dataset

Abstract

The increasing availability of traffic videos functioning on a 24/7/365 time scale has the great potential of increasing the spatio-temporal coverage of traffic accidents, which will help improve traffic safety. However, analyzing footage from hundreds, if not thousands, of traffic cameras in a 24/7/365 working protocol remains an extremely challenging task, as current vision-based approaches primarily focus on extracting raw information, such as vehicle trajectories or individual object detection, but require laborious post-processing to derive actionable insights. We propose SeeUnsafe, a new framework that integrates Multimodal Large Language Model (MLLM) agents to transform video-based traffic accident analysis from a traditional extraction-then-explanation workflow to a more interactive, conversational approach. This shift significantly enhances processing throughput by automating…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ai4ce/seeunsafe
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSentiment Analysis and Opinion Mining · Speech and dialogue systems · Multimodal Machine Learning Applications

MethodsFocus