Long-term Multi-granularity Deep Framework for Driver Drowsiness Detection
Jie Lyu, Zejian Yuan, Dapeng Chen

TL;DR
This paper introduces a novel deep learning framework combining multi-granularity CNNs and LSTM to effectively detect driver drowsiness from videos, handling large head pose variations and temporal dependencies, achieving state-of-the-art accuracy.
Contribution
The paper proposes a new multi-granularity CNN-LSTM framework for driver drowsiness detection, addressing head pose variation and temporal dependencies, and introduces a new high-precision dataset.
Findings
Achieves 90.05% accuracy on NTHU-DDD dataset
Operates at about 37 fps in real-time
Sets a new state-of-the-art in driver drowsiness detection
Abstract
For real-world driver drowsiness detection from videos, the variation of head pose is so large that the existing methods on global face is not capable of extracting effective features, such as looking aside and lowering head. Temporal dependencies with variable length are also rarely considered by the previous approaches, e.g., yawning and speaking. In this paper, we propose a Long-term Multi-granularity Deep Framework to detect driver drowsiness in driving videos containing the frontal faces. The framework includes two key components: (1) Multi-granularity Convolutional Neural Network (MCNN), a novel network utilizes a group of parallel CNN extractors on well-aligned facial patches of different granularities, and extracts facial representations effectively for large variation of head pose, furthermore, it can flexibly fuse both detailed appearance clues of the main parts and local to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSleep and Work-Related Fatigue · Video Surveillance and Tracking Methods · Fire Detection and Safety Systems
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Memory Network
