Vision-Language Models can Identify Distracted Driver Behavior from   Naturalistic Videos

Md Zahid Hasan; Jiajing Chen; Jiyang Wang; Mohammed Shaiqur Rahman,; Ameya Joshi; Senem Velipasalar; Chinmay Hegde; Anuj Sharma; Soumik Sarkar

arXiv:2306.10159·cs.CV·March 22, 2024·1 cites

Vision-Language Models can Identify Distracted Driver Behavior from Naturalistic Videos

Md Zahid Hasan, Jiajing Chen, Jiyang Wang, Mohammed Shaiqur Rahman,, Ameya Joshi, Senem Velipasalar, Chinmay Hegde, Anuj Sharma, Soumik Sarkar

PDF

Open Access 1 Repo

TL;DR

This paper introduces a CLIP-based framework for recognizing distracted driver behaviors from naturalistic videos, achieving state-of-the-art zero-shot and fine-tuned performance with limited annotated data.

Contribution

It presents a novel application of vision-language models like CLIP for distracted driving detection, enabling effective zero-shot and few-shot learning from naturalistic driving videos.

Findings

01

State-of-the-art zero-shot performance on public datasets

02

Effective frame-based and video-based detection frameworks

03

Robust distracted activity classification with limited data

Abstract

Recognizing the activities causing distraction in real-world driving scenarios is critical for ensuring the safety and reliability of both drivers and pedestrians on the roadways. Conventional computer vision techniques are typically data-intensive and require a large volume of annotated training data to detect and classify various distracted driving behaviors, thereby limiting their efficiency and scalability. We aim to develop a generalized framework that showcases robust performance with access to limited or no annotated training data. Recently, vision-language models have offered large-scale visual-textual pretraining that can be adapted to task-specific learning like distracted driving activity recognition. Vision-language pretraining models, such as CLIP, have shown significant promise in learning natural language-guided visual representations. This paper proposes a CLIP-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zahid-isu/driveclip
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGaze Tracking and Assistive Technology · Human Pose and Action Recognition · Human-Automation Interaction and Safety

MethodsContrastive Language-Image Pre-training