ScVLM: Enhancing Vision-Language Model for Safety-Critical Event   Understanding

Liang Shi; Boyu Jiang; Tong Zeng; Feng Guo

arXiv:2410.00982·cs.CV·March 11, 2025

ScVLM: Enhancing Vision-Language Model for Safety-Critical Event Understanding

Liang Shi, Boyu Jiang, Tong Zeng, Feng Guo

PDF

Open Access 1 Repo

TL;DR

ScVLM is a hybrid vision-language model that improves understanding and description of safety-critical traffic events, reducing hallucinations and enhancing accuracy for driver assistance systems.

Contribution

It introduces a novel hybrid training approach combining supervised and contrastive learning for better SCE classification and description in vision-language models.

Findings

01

Outperforms existing models in generating accurate SCE descriptions.

02

Reduces hallucinations in vision-language event understanding.

03

Validated on over 8,600 real-world traffic events.

Abstract

Accurately identifying, understanding and describing traffic safety-critical events (SCEs), including crashes, tire strikes, and near-crashes, is crucial for advanced driver assistance systems, automated driving systems, and traffic safety. As SCEs are rare events, most general vision-language models (VLMs) have not been trained sufficiently to link SCE videos and narratives, which could lead to hallucinations and missing key safety characteristics. Here, we introduce ScVLM, a novel hybrid methodology that integrates supervised and contrastive learning techniques to classify the severity and types of SCEs, as well as to generate narrative descriptions of SCEs. This approach utilizes classification to enhance VLMs' comprehension of driving videos and improve the rationality of event descriptions. The proposed approach is trained on and evaluated by more than 8,600 SCEs from the Second…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

datadrivenwheels/scvlm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSafety Warnings and Signage

MethodsContrastive Learning