DIF : Dataset of Perceived Intoxicated Faces for Drunk Person Identification
Vineet Mehta, Devendra Pratap Yadav, Sai Srinadhu Katta, Abhinav Dhall

TL;DR
This paper introduces DIF, a novel audio-visual dataset for detecting intoxicated individuals, and proposes bimodal deep learning methods for non-invasive intoxication detection to improve road safety.
Contribution
The work presents the first dataset of perceived intoxicated faces and develops bimodal CNN and DNN models for automatic intoxication detection.
Findings
CNN and DNN baselines established for audio-visual data
3D CNN captures spatio-temporal video features
Proposed non-linearity variation enhances 3D convolution effectiveness
Abstract
Traffic accidents cause over a million deaths every year, of which a large fraction is attributed to drunk driving. An automated intoxicated driver detection system in vehicles will be useful in reducing accidents and related financial costs. Existing solutions require special equipment such as electrocardiogram, infrared cameras or breathalyzers. In this work, we propose a new dataset called DIF (Dataset of perceived Intoxicated Faces) which contains audio-visual data of intoxicated and sober people obtained from online sources. To the best of our knowledge, this is the first work for automatic bimodal non-invasive intoxication detection. Convolutional Neural Networks (CNN) and Deep Neural Networks (DNN) are trained for computing the video and audio baselines, respectively. 3D CNN is used to exploit the Spatio-temporal changes in the video. A simple variation of the traditional 3D…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Methods3D Convolution · Convolution
