HazardNet: A Small-Scale Vision Language Model for Real-Time Traffic Safety Detection at Edge Devices
Mohammad Abu Tami, Mohammed Elhenawy, and Huthaifa I. Ashqar

TL;DR
HazardNet is a compact vision-language model fine-tuned for real-time traffic safety detection on edge devices, utilizing a new VQA dataset to outperform larger models in accuracy and efficiency.
Contribution
This work introduces HazardNet, a small-scale vision-language model optimized for edge deployment, and HazardQA, a specialized dataset for safety-critical traffic scenarios.
Findings
HazardNet achieved up to 89% improvement in F1-Score over the base model.
HazardNet's performance is comparable to larger models like GPT-4o, with some cases up to 6% better.
The model enables real-time traffic safety detection on resource-constrained edge devices.
Abstract
Traffic safety remains a vital concern in contemporary urban settings, intensified by the increase of vehicles and the complicated nature of road networks. Traditional safety-critical event detection systems predominantly rely on sensor-based approaches and conventional machine learning algorithms, necessitating extensive data collection and complex training processes to adhere to traffic safety regulations. This paper introduces HazardNet, a small-scale Vision Language Model designed to enhance traffic safety by leveraging the reasoning capabilities of advanced language and vision models. We built HazardNet by fine-tuning the pre-trained Qwen2-VL-2B model, chosen for its superior performance among open-source alternatives and its compact size of two billion parameters. This helps to facilitate deployment on edge devices with efficient inference throughput. In addition, we present…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
