Zero-Shot Distracted Driver Detection via Vision Language Models with Double Decoupling
Takamichi Miyata, Sumiko Miyata, Andrew Morris

TL;DR
This paper introduces a novel zero-shot distracted driver detection method using vision-language models, employing a double decoupling framework to improve robustness by isolating behavior cues from appearance variations.
Contribution
We propose a subject decoupling framework and orthogonalize text embeddings to enhance zero-shot distracted driver detection with vision-language models.
Findings
Consistent performance improvements over prior baselines
Effective removal of appearance bias enhances detection accuracy
Potential for practical road-safety applications
Abstract
Distracted driving is a major cause of traffic collisions, calling for robust and scalable detection methods. Vision-language models (VLMs) enable strong zero-shot image classification, but existing VLM-based distracted driver detectors often underperform in real-world conditions. We identify subject-specific appearance variations (e.g., clothing, age, and gender) as a key bottleneck: VLMs entangle these factors with behavior cues, leading to decisions driven by who the driver is rather than what the driver is doing. To address this, we propose a subject decoupling framework that extracts a driver appearance embedding and removes its influence from the image embedding prior to zero-shot classification, thereby emphasizing distraction-relevant evidence. We further orthogonalize text embeddings via metric projection onto Stiefel manifold to improve separability while staying close to the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Autonomous Vehicle Technology and Safety · Human-Automation Interaction and Safety
