Vision Language Model for Interpretable and Fine-grained Detection of Safety Compliance in Diverse Workplaces
Zhiling Chen, Hanning Chen, Mohsen Imani, Ruimin Chen, and Farhad, Imani

TL;DR
This paper presents Clip2Safety, a vision-language model framework that improves the accuracy and speed of detecting PPE compliance and attributes in diverse workplaces, enhancing safety monitoring.
Contribution
Introduction of Clip2Safety, a novel interpretable detection framework combining scene recognition, visual prompts, and fine-grained verification for PPE compliance across various scenarios.
Findings
Outperforms state-of-the-art VLMs in accuracy.
Achieves inference times 200 times faster.
Validated across six real-world workplace scenarios.
Abstract
Workplace accidents due to personal protective equipment (PPE) non-compliance raise serious safety concerns and lead to legal liabilities, financial penalties, and reputational damage. While object detection models have shown the capability to address this issue by identifying safety items, most existing models, such as YOLO, Faster R-CNN, and SSD, are limited in verifying the fine-grained attributes of PPE across diverse workplace scenarios. Vision language models (VLMs) are gaining traction for detection tasks by leveraging the synergy between visual and textual information, offering a promising solution to traditional object detection limitations in PPE recognition. Nonetheless, VLMs face challenges in consistently verifying PPE attributes due to the complexity and variability of workplace environments, requiring them to interpret context-specific language and visual cues…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOccupational Health and Safety Research · Safety Warnings and Signage · Risk and Safety Analysis
MethodsRoIPool · 1x1 Convolution · Region Proposal Network · Softmax · Faster R-CNN · Convolution · Non Maximum Suppression · SSD
