Applying Spatiotemporal Attention to Identify Distracted and Drowsy Driving with Vision Transformers
Samay Lakhani

TL;DR
This paper explores the use of vision transformers for detecting drowsy and distracted driving, showing promising results for distraction detection but limited success for drowsiness due to data constraints.
Contribution
It demonstrates the application of vision transformers to driving safety, outperforming previous models in distraction detection and highlighting challenges in drowsiness detection.
Findings
Distracted driving model achieved 97.5% accuracy.
Drowsiness detection model reached only 44% accuracy.
Transformers show potential with sufficient data and architecture improvements.
Abstract
A 20% rise in car crashes in 2021 compared to 2020 has been observed as a result of increased distraction and drowsiness. Drowsy and distracted driving are the cause of 45% of all car crashes. As a means to decrease drowsy and distracted driving, detection methods using computer vision can be designed to be low-cost, accurate, and minimally invasive. This work investigated the use of the vision transformer to outperform state-of-the-art accuracy from 3D-CNNs. Two separate transformers were trained for drowsiness and distractedness. The drowsy video transformer model was trained on the National Tsing-Hua University Drowsy Driving Dataset (NTHU-DDD) with a Video Swin Transformer model for 10 epochs on two classes -- drowsy and non-drowsy simulated over 10.5 hours. The distracted video transformer was trained on the Driver Monitoring Dataset (DMD) with Video Swin Transformer for 50 epochs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSleep and Work-Related Fatigue · IoT and GPS-based Vehicle Safety Systems · Occupational Health and Safety Management
MethodsAttention Is All You Need · Test · Linear Layer · Stochastic Depth · Position-Wise Feed-Forward Layer · Softmax · Byte Pair Encoding · Vision Transformer · Adam · Label Smoothing
